1 Introduction

The high-efficiency video coding (HEVC) standard is developed by the Joint Collaborative Team on Video Coding (JCT-VC) to increase the coding efficiency [1]. HEVC reduces the bit rate by nearly 50 % compared to its previous standard, H.264/AVC [2]. On the other hand, the complexity of the encoder is greatly increased. In HEVC, the use of high-performance coding tools has increased the complexity while improving the video quality and obtaining a higher compression ratio. HEVC uses larger block sizes which enables higher compression especially for high-resolution sequences. A Coding tree unit (CTU) is the basic block of size \(L\times L\), where L is 64, 32 or 16. Large sized CTUs yield higher compression. A CTU is divided into Coding Units (CUs) in a quad tree structure. These CUs are further divided into Prediction Units (PUs) and Transform Units (TUs) in a tree structure. Each of the CTUs, CUs, PUs and TUs has one luma component and two chroma components and they are called as Coding Tree Blocks (CTBs), Coding Block (CBs), Prediction Block (PB) and Transform Block (TB), respectively. For deciding the block size, rate-distortion (R-D) optimization is done for all the possible combinations of CUs, PUs and TUs. Due to the recursive tree-structured partitioning of CTUs to CUs and then CUs to PUs and TUs, the complexity of motion estimation in HEVC is increased tremendously.

As it is well known, intra coding reduces spatial redundancy and inter coding reduces temporal redundancy in consecutive frames through motion estimation and motion compensation. The motion estimation complexity is increased by many folds in HEVC compared to H.264/AVC. Due to the huge increase in computational complexity of the codec, it cannot be used for low-end devices having hardware and power constraints and real-time applications. Therefore, the complexity has to be reduced to a great extent without significant degradation of video quality and increase in bit rate.

In HEVC, the complexity of intra coding can be reduced by fast intra mode decision algorithms proposed in [35]. In inter coding, the complexity can be reduced by fast CU mode decision algorithms and by reducing the complexity in the block-matching process of motion estimation. There are many fast inter mode decision algorithms to reduce the motion estimation complexity [611]. In the block-matching phase, the reduction in motion estimation complexity can be achieved by reducing the number of search points, using the low complexity distortion measure, by using partial sum of absolute difference [12] and early termination of motion estimation process.

To reduce the motion estimation complexity in the searching process of the best matching reference block, the number of search points can be reduced either by using fast search patterns or by using an adaptive search range algorithm. Fast search patterns like three-step search [13], diamond search [14], Hexagonal search [15] and gradient descent search [16] algorithms reduce the number of search points to a great extent. But, there is a severe quality degradation and increase in bit rate, especially for high-motion activity sequences. Moreover, these fast search pattern algorithms are difficult to implement in hardware. Adaptive search range algorithms, on the other hand, vary the search range depending upon the predicted motion activity of the current block using neighboring information and they are hardware friendly.

Full search provides an optimum video quality, but it incurs a heavy computational burden. In order to reduce the complexity of motion estimation, the Test zonal search (TZS) motion estimation algorithm [17] is employed in the fast search mode of the motion estimation module in the HM encoder. TZS motion estimation reduces the complexity to a great extent with a negligible loss in coding efficiency. In uni-prediction, the motion compensation is carried out using one reference frame list, whereas in bi-prediction, it uses two reference frame lists [18, 19]. The use of two reference frame lists provides better coding efficiency. In this paper, complexity reduction algorithms are proposed for fast motion estimation of HEVC in uni-prediction, bi-prediction and while using both. These models are applied in the fast search mode of the HEVC encoder. An overview of the TZS algorithm and related work on fast motion estimation are presented in Sect. 2. In Sect. 3, the proposed fast motion estimation technique is detailed. Experimental results and related discussion are described in Sect. 4. Finally, Sect. 5 concludes the paper.

2 Related work

2.1 Overview of TZS motion estimation

The TZS motion estimation (TZSME) algorithm consists of four different stages: motion vector prediction, initial grid search, raster search and refinement search.

2.1.1 Motion vector prediction

Initially, the best motion vector predictor is obtained out of median predictor, left predictor, up predictor and upper right predictor. The predictor having the minimum cost is chosen as the best motion vector predictor and used as the initial search point. The maximum number of search points in this stage is the number of motion vector predictors used.

2.1.2 Initial grid search

In this stage, the diamond search or square search pattern is used with the stride length varying from 1 to the maximum search range (\(\text{SR}_{\max }\)) in powers of 2. Among all the search points, the one with the minimum cost is taken as the best search point. The maximum number of search points in this stage is

$$N_\text{grid} = 1+8\times \lfloor (\log _{2}(\text{SR}_{\max })+1)\rfloor$$
(1)

2.1.3 Raster search

Raster search is a simple full search on a down-sampled version of the search window. Raster search is done with a distance of ”Rasterdistance”. Raster search is performed if the ”Bestdistance” obtained from the previous stage is greater than ”Rasterdistance”. Otherwise, this stage is skipped. For a PB, if the raster search stage is not skipped, then the complexity of this stage is very high compared to the other stages. The maximum number of search points in this stage is

$$N_\text{raster} = \left\lceil \frac{(2\times {\text{SR}}_{\max }+1)}{\text{Rasterdistance}}\right\rceil ^2$$
(2)

2.1.4 Refinement search

In the refinement search stage, either raster refinement or star refinement is used. The search pattern used in the initial grid search is used for refinement of the motion vector in star refinement. The maximum number of search points in this stage is not fixed. It varies differently for raster and star refinements.

For \(\text{SR}_{\max }\) of 64, the maximum number of search points in the grid search stage and raster search stage are 53 and 625, respectively. Therefore, the raster search stage has huge complexity compared to the other stages, especially if \(\text{SR}_{\max }\) is high.

The HM encoder uses an optimized version of the TZS algorithm. The motion estimation complexity of the optimized version of TZS algorithm is much lower than the TZS algorithm. It uses an advanced motion vector predictor (AMVP) and the zero predictor in the motion vector prediction stage. In the latest version of the HM encoder, the upper mode motion vector is also used. In grid search and star refinement stage, it has 16 search points for the diamond and square patterns of distance >8. The grid search stage is optimized with early grid search termination. From here onward, TZS refers to the optimized version of the TZS algorithm and TZSu refers to the optimized TZS using the upper mode motion vector in the motion vector prediction stage.

Table 1 shows that the use of the upper mode motion vector as one of its predicted motion vectors in TZS (TZSu) reduces the complexity by 48.7 % on an average in terms of the reduction in number of search points (\({\Delta } N\)) and 44.52 % on an average in terms of motion estimation time saving (\({\Delta } \text{METS}\)) for class B sequences. Here, both TZS and TZSu are optimized. Reduction in encoding time (\({\Delta } \text{ET}\)) is 10.06 %. The BD-Rate and BD-PSNR are 0.167 and −0.0021, respectively. This implies that there is a significant reduction in complexity without much impact on the video quality and compression. This is due to the fact that the upper mode motion vector is highly correlated to its lower mode motion vector. In the recent literature on HEVC [2022] full search algorithm is used for testing the performance of the adaptive search range algorithm. In [23] and [24], TZS is used for testing the performance of the adaptive search range algorithms. The full search algorithm has a huge complexity. TZS has much less complexity compared to full search and the performance is almost the same as that of full search in terms of the visual quality as well as compression. But, still the complexity of TZS is too high to make it feasible for real-time applications, especially for high-motion activity and high definition sequences. Hence, we targeted the TZS module for complexity reduction in uni-prediction.

Table 1 Comparison of TZSu with TZS for first 100 frames of Class B sequences in LD-P Main profile

2.2 Adaptive search range algorithm

Adaptive search range algorithms reduce the motion estimation complexity by reducing the search range based on statistics of the neighboring blocks or the previous frame. Distortion and motion information are used as the statistical parameters to estimate the search range. The dynamic padding window size (DPWS) motion estimation proposed in [25] is based on the sum of absolute difference (SAD) between the current block and the reference block at the predicted motion vector position. This algorithm reduces the search range, thereby reducing the complexity with a little loss of video quality. In [23], a linear adaptive search range model is proposed. The search range is varied as a function of the predicted motion vector and motion consistency. The coefficients for finding the search range vary for different prediction unit modes and coding unit modes. In [20], the motion vector difference (MVD) of each of the blocks in the previous frame is modeled using a Laplacian distribution. In [22], a Cauchy distribution-based adaptive search range (CASR) algorithm is proposed where the author has claimed that the Cauchy distribution is better than the Laplacian distribution to model the probability distribution of the MVDs. In this algorithm, the distribution of the motion vector differences in the previous frame is used to estimate the parameters in the Cauchy distribution. Based on this model, the search range is fixed for the current frame. Also, the search range is increased to some extent if the predicted motion activity is high for the current frame. In [24], the maximum of all the MVDs in the spatial, temporal and upper mode neighbors are used to predict the search range in the dynamic search range (DSR) algorithm. The predicted search range is clipped in between 16 and 64. This method reduces the complexity with very little effect on the bit rate and PSNR.

2.3 External stop search algorithm

The external stop search algorithms terminate the searching process of motion estimation when the minimum cost of the current search point is less than a threshold. The threshold is derived from the cost of the neighboring blocks. In [25, 26], the external stop search algorithm and the dynamic external stop search algorithm reduce the complexity of motion estimation by terminating the search process when the threshold condition is satisfied. This algorithm is applicable for full search motion estimation and moreover, the cost of the neighboring blocks for all the block sizes needs to be stored in order to calculate the threshold.

2.4 Fast bi-prediction motion estimation algorithm

The HM encoder has a search range of 64 for uni-prediction motion estimation. It has a search range of 4 for bi-prediction motion estimation. Though the search range is small for bi-prediction, with the use of the fast search algorithm in HEVC, the motion estimation complexity in bi-prediction is comparable to that of uni-prediction. Table 2 shows the comparison of motion estimation time in uni-prediction and bi-prediction. For class B sequences, the percentage of the number of search points in uni-prediction, (\(\%N_\text{uni}\)) and bi-prediction, (\(\%N_\text{bi}\)) compared to the total number of search points are 77.07 and 22.93 %, respectively. The percentage of the uni-prediction motion estimation time (\(\%\text{MET}_\text{uni}\)) and bi-prediction motion estimation time (\(\%\text{MET}_\text{bi}\)) compared to the total motion estimation time are 75.37 and 24.63 %, respectively. Therefore, it is also necessary to reduce the bi-prediction motion estimation (BME) complexity in the HM encoder. In [23], a motion analysis-based adaptive search range algorithm (MAASR) is proposed to reduce the bi-prediction motion estimation complexity. This algorithm is simple and reduces the complexity of bi-prediction motion estimation with a minor loss in coding efficiency. In [27], an SAD-based bi-prediction selection method is proposed to reduce the bi-prediction motion estimation complexity.

Table 2 Comparison of number of search points and motion estimation time in uni-prediction and bi-prediction for Class B sequences in LD-B Main profile at QP = 32

In this paper, our goal is to reduce the complexity of TZS algorithm in uni-prediction and BME algorithm in bi-prediction by limiting the search range externally and internally.

3 Proposed algorithm

3.1 External search range reduction for TZS

Our proposed external search range reduction (ESR) algorithm is intended to reduce the motion estimation complexity while maintaining the coding efficiency. In motion estimation, most of the PBs find their best matching reference PBs nearer to the center of the search window. Using the entire search range for all the PBs involves unnecessary computation of SADs. In the proposed approach, the search range is adaptively varied based on the MVDs of the spatial neighboring blocks and the SAD at the search center. The MVDs in x and y directions are computed as follows:

$$\begin{aligned} \text{MVD}_x&= \text{MV}_x - \text{PMV}_x \\ \text{MVD}_y&= \text{MV}_y - \text{PMV}_y \end{aligned}$$
(3)

where \(\text{MVD}_x\) and \(\text{MVD}_y\) are MVDs in x and y direction. \(\text{MV}_x\) and \(\text{MV}_y\) are motion vectors in x and y direction. \(\text{PMV}_x\) and \(\text{PMV}_y\) are predicted motion vectors in x and y directions.

3.1.1 Search range using spatial neighbors

The spatial neighboring blocks and the current block may belong to the same object. So, the motion of the current block and the spatial neighboring blocks are highly correlated, especially if the neighboring blocks and the current blocks share the same object. The upper block and the left block are more correlated to the current block. In order to predict the search range using spatial neighbors, the distance of MV from PMV of these two blocks are used. Additionally, the distance of MV from PMV of the upper right block is also used to predict the search range. Here, the block left to the bottom left pixel of the current block is used as the left block instead of block left to the top left pixel of the current block. This is because the block left to the bottom left pixel of the current PB is less correlated to the other neighbors used. Now, the spatial neighbors are less correlated to each other but are correlated to the current block.

$$d(x,y)= {\sqrt{ \text{MVD}_X^2 + \text{MVD}_Y^2 }}$$
(4)
$$\text{SR}_1= k_1\times \max \left( d_\text{L} ,d_\text{U} ,d_\text{UR} \right)$$
(5)

where \(d_\text{L}\), \(d_\text{U}\) and \(d_\text{UR}\) are the distance of MV from PMV of left, upper and upper right blocks. \(k_1\) is a constant.

3.1.2 Search range using motion vector difference distribution of previous frame

It is claimed that the motion vector differences follow the Cauchy distribution [22]. Moreover, the motion of the current frame is highly correlated to the motion of the previous frame. The distribution of MVDs in the previous frame is used to predict the search range of the current frame. Let the predicted search range for the current frame in x and y directions be \(C_x\) and \(C_y\). \(C_x\) and \(C_y\) are obtained from the Cauchy distribution of the MVDs of the previous frame with the Cauchy parameter chosen as 2 [28] and the predefined probability constant given in [22] chosen as 99.2 %. In a frame, the motion is not same for all the blocks. So, the predicted search range has to be updated based on the neighboring motion information. Maximum of the neighboring blocks MVD, maxMVD is used to update the search range for the current block. For deriving the maxMVD, left (the block at the left of the bottom left pixel of the current block), upper and upper right neighbors are used. The search range is found as a function of the maximum of the updated search range in x and y directions as given in (6).

$$\begin{aligned} \text{SR}_2&= \max ( C_x+ k_2\times \text{maxMVD}_x , \\&\qquad \,C_y+ k_2\times \text{maxMVD}_y ) \end{aligned}$$
(6)

where \(k_2\) is a constant. In RA mode, upto two neighboring frames are available for each frame, and one is in the forward direction and the other in the backward direction. Each of \(C_x\) and \(C_y\) is obtained as the maximum of these two search ranges, respectively, found from the two nearest available neighboring frames.

3.1.3 Search range using SAD at search center

The motion vector of the current block is proportional to the SAD at the PMV \((\text{SAD}_\text{PMV})\). This implies that for the PB having a large motion vector, \(\text{SAD}_\text{PMV}\) is higher and for the PB having a small motion vector, \(\text{SAD}_\text{PMV}\) is lower. In TZS of the HM-16.6 encoder, the grid search is done from the best search center. The best search center is the the one which has the minimum cost among the AMVP, zero and depth predictors. \(\text{SR}_3\), which is the search range, uses the SAD of the best search center. The search range is fixed based on the normalized \(\text{SAD}_\text{PMV}\) with respect to the size of the block and is given in (7).

$$\text{SR}_3 = k_3\times \frac{\text{SAD}_\text{PMV}}{n_1 \times n_2}$$
(7)

where \({n_1 \times n_2}\) is the block size. \(k_3\) is a constant set to 1. The value of \(k_3\) should be small for low resolution sequences and large for high-resolution sequences. However, in this work it is fixed for all the sequences.

Finally, the maximum of all the three predicted search ranges is chosen as the search range as given in (8).

$$\text{SR}=\max ( \text{SR}_1, \text{SR}_2, \text{SR}_3)$$
(8)

In order to avoid the degradation in video quality, the search range (SR) is further modified as given in (9).

$$\text{SR} = \max \left( \text{SR}, 8 + \max \left( C_x ,C_y \right) \right)$$
(9)

The above equation implies that the minimum value of SR is 8. The term \(\max \left( C_x ,C_y \right)\) specifies the motion information of the previous frame. If the previous frame has high motion, then the current frame is also expected to have high motion and the minimum value of SR is \(8+\max \left( C_x ,C_y \right)\). But, this can increase the complexity if the previous frame has high-motion activity and the current PB is of low motion. So, the minimum value of SR is limited to \(8 + \min \left( a, \max \left( C_x ,C_y \right) \right)\) with the value of a set to 8. So, the search range is computed as specified in (10).

$$\text{SR} = \max \left( \text{SR}, 8 + \min \left( a, \max \left( C_x ,C_y \right) \right) \right)$$
(10)

Here, the values of \(k_1\), \(k_2\), \(k_3\), and a affect the motion estimation complexity and video quality. In order to vary the motion estimation complexity, the value of these constants should be varied accordingly to achieve a better trade off between the complexity and quality.

3.2 Internal search range reduction for raster search of TZS

In the HM encoder, the raster search stage is carried out only if in the grid search stage of TZS the motion vector is greater than raster distance. The raster search stage of TZS is more complex compared to the grid search and refinement search stages. The default value of raster distance in the raster search stage is 5. This implies that the search points in raster search are apart from its neighboring search points by a distance of 5 pixels horizontally, vertically and diagonally. As the raster search stage involves a huge complexity, we have proposed an internal search range reduction approach, namely, ISR-R to reduce the complexity of raster search in TZS.

The ISR-R algorithm is similar to the external stop search algorithm proposed in [25]. The external stop search algorithm is suitable for full search motion estimation and it needs to store the cost for all the block sizes as it is dependent on the cost of its neighboring blocks of the same size. ISR-R does not need neighboring cost information and is suitable for the raster search stage of TZS.

In ISR-R raster search is done in a spiral scan order. In Fig. 1, there are 4 rounds for a search range of 16. The search center of raster search is numbered as 0. The search point numbered 0 corresponds to round 0. The search points numbered as 1 corresponds to round 1 and so on. For a search range of 64, there are 13 rounds.

Fig. 1
figure 1

Raster search pattern in spiral scan order with raster distance of 5 for a search range of 16

The ISR-R algorithm starts calculating the cost from round 1. Let, \(J_{\min } \left( 0 \right)\) correspond to the minimum cost in round 0. Round 0 is the predicted motion vector position and its cost is calculated earlier. The minimum cost in the grid search stage of TZS is less than or equal to the minimum cost at round 0. The cost of all the search points in round 1 (numbered as 1 in Fig. 1) are calculated and the minimum cost among all the search points in this round is stored as \(J_{\min } \left( 1 \right)\). If \(r-1\) is the round having the minimum cost, and \(J_{\min } \left( r-1 \right)\) is greater than the minimum cost obtained in the grid search stage, then the search process continues in the next round. If \(J_{\min } \left( r-1 \right)\) is less than or equal to the minimum cost obtained in the grid search stage, then the search process is terminated for the raster search stage if the minimum cost of the current round is greater than the minimum cost of the previous round as given in (11). Otherwise, the search process continues in the next round.

$$J_{\min }(r-1) < J_{\min }(r)$$
(11)

There can be a case where the minimum cost of the previous round is less than the minimum cost of the current round, but is greater than the minimum cost in the next round. If the threshold given in (11) is used, then the raster search process terminates quickly without detecting the actual minimum cost search point. This causes false skipping of the raster search.

Figure 2 shows the error surface for a \(32\times 16\) PB in the third frame of the Class D sequence, RaceHorses in the raster search stage. The error surface over the raster search stage has costs only at the numbered search points in a search range. This PB belongs to the eleventh CTU at a pixel position of (192, 64) in the frame. As raster search is done in a spiral scan manner, the minimum cost in each round, \(J_{\min }(r)\) and its corresponding best MV of round r are shown in Table 3. Table 3 shows that the minimum cost, \(J_{\min }(r)\) decreases with an increase in r till r = 5 and then it increases for r = 6. With the condition given in (11), the raster search is terminated resulting in the minimum cost at r = 5 with \(J_{\min }(5) = 6165\) at (−15, −9). But, the actual minimum cost is at r = 10 with \(J_{\min }(10) = 5493\) at (60, −24). Due to this, the best raster MV is (−15, −9) instead of (60, −24). This is termed as false termination of raster search. From Fig. 2b, the actual best raster MV and the raster MV due to false termination of raster search are far apart. Therefore, the minimum cost of the current round should not be compared with a threshold equal to that of the minimum cost of the previous round. The current threshold has to be reduced in order to minimize the chance of false termination of raster search. For this (11) is modified by introducing a scaling factor of \(\frac{7}{8}\) as shown in (12). This scaling factor is used in order to avoid the false termination of raster search so that the reduction in quality is negligible. Also, \(\frac{7}{8}\times J_{\min }(r)\) is nearly equal to \(( J_{\min }(r) - (J_{\min }(r)>>3) )\).

$$J_{\min }(r-1) < \frac{7}{8}\times J_{\min }(r)$$
(12)

From round 2, if \(J_{\min } \left( r-1 \right)\) is less than or equal to the minimum cost obtained in the grid search stage, then the minimum cost of the current round is compared with the cost of rounds \(r-1\) and \(r-2\) as shown in (13) and (14). If both the conditions are satisfied, then raster search is terminated. Otherwise, the condition given in (12) is checked to terminate raster search. This process is repeated till all the rounds in the search range are completed or till the raster search stage is terminated.

$$J_{\min }(r-2)< J_{\min }(r-1)$$
(13)
$$J_{\min }(r-2)< J_{\min }(r)$$
(14)

The steps followed in the ISR-R algorithm are as follows:

  1. 1.

    Store the cost at the search center of raster search as \(J_{\min } \left( 0 \right)\) and set r = 1.

  2. 2.

    Calculate the cost of all the search points for the current round and obtain the minimum cost \(J_{\min } \left( r \right)\).

  3. 3.

    If \(J_{\min } \left( r-1 \right)\) is less than or equal to the minimum cost obtained in the grid search stage, then if r = 1 go to step 4; else, go to step 6. Otherwise, r = r + 1 and go to step 2.

  4. 4.

    If the condition given in (12) is satisfied, then go to step 9. Otherwise, go to step 5.

  5. 5.

    r = r + 1. Calculate the cost of all the search points in the current round and obtain the minimum cost \(J_{\min } \left( r \right)\).

  6. 6.

    If the conditions given in (13) and (14) are satisfied, then go to step 8. Otherwise, go to step 7.

  7. 7.

    If the condition given in (12) is satisfied, go to step 9. Otherwise, go to step 8.

  8. 8.

    If the current round is the last round, then go to step 9. Otherwise, go to step 5.

  9. 9.

    Terminate raster search and perform refinement search.

Fig. 2
figure 2

Error surface for 32 × 16 prediction block in the third frame of Class D RaceHorses sequence in raster search stage. a Error surface plot over a search range. b Error surface from top view with white dots showing that the actual best raster MV (60, −24) and the raster MV due to false termination of raster search (−15, −9) are far apart

Table 3 Minimum cost of all rounds in the error surface of a 32 × 16 prediction block in the third frame of the Class D RaceHorses sequence in raster search

3.3 Fast bi-prediction motion estimation algorithm

In the HM encoder, the motion estimation process for bi-prediction is performed over the search range of 4 around the uniprediction motion vector. The complexity of motion estimation in bi-prediction can be reduced at the following two stages:

  1. 1.

    External search range reduction for bi-prediction

  2. 2.

    Internal search range reduction for bi-prediction

3.3.1 External search range reduction for bi-prediction

Figure 3a shows the proposed external search range reduction approach based on the uni-prediction motion vector. If the maximum of the MVDs of the x and y components is <2, then the search range is fixed as shown in (15)

$$\begin{aligned} \text{SR}&= \max (|MVDx|, |MVDy|) + 1 \\&\qquad \text{if} \quad \max (|MVDx|, |MVDy|) < 3 \\&= 4 \qquad \text{otherwise} \end{aligned}$$
(15)
Fig. 3
figure 3

Fast bi-prediction motion estimation algorithm. a External search range reduction for bi-prediction. b Internal search range reduction for bi-prediction

3.3.2 Internal search range reduction for bi-prediction

The internal search range reduction for bi-prediction is shown in Fig. 3b. After fixing the search range, motion estimation is done in a spiral scan manner instead of raster scan. From Fig. 4, it is observed that there are a total of 5 rounds in the search range with the rounds numbered from 0 to 4. In the motion estimation process, the minimum cost among all the previous search points is taken as the best cost. But, here the best cost is calculated for each round at the end of the motion estimation process of that round. After calculating the best cost for the first two rounds, if the best cost at round 0 is less than the best cost at round 1, then the motion estimation process is terminated. Otherwise, the best cost of the next round is calculated and compared with the best cost of the current round. This process is continued till the internal search range reduction condition, given in (16) is satisfied or till the search points corresponding to all the rounds are used for calculating the cost.

$$J_{\min }(r-1) < J_{\min }(r).$$
(16)
Fig. 4
figure 4

Spiral search in bi-prediction search range

4 Experimental results

The proposed algorithm is implemented in the HM-16.6 encoder [29]. All the experiments are conducted on an Intel(R) Xeon CPU having a 2.5 GHz clock and 32 GB RAM. Standard test video sequences [30] of Class B, C, D and E given in [31] are used. These sequences, respectively, have resolutions of \(1920\times 1080,\, 832\times 480,\, 416\times 240\) and \(1280\times 720\). All the test sequences have 8 bits per sample bit depth with 4:2:0 chroma subsampling. First 100 frames of each test sequence are used for encoding. The proposed algorithm is implemented in the fast search mode of the HM-16.6 encoder. LD-P main configuration is used for testing the performance of ESR and ISR-R algorithms. The LD-B main configuration is used for testing the performance of the FBME algorithm. The combined algorithms ESR + ISR-R is implemented in the LD-P main profile. The combined and ESR + ISR-R + FBME is implemented in LD-B main and RA main profiles. The number of reference frames used is 4 and the maximum search range is set to 64. Fast encoder decision is enabled. All the simulations are carried out using Microsoft Visual Studio 2013 in the release mode. Based on the common test condition defined in the HEVC standard document, all the sequences are encoded at quantization parameter values of 22, 27, 32 and 37.

Bjontegaard Delta Rate (BD-Rate) and Bjontegaard Delta PSNR (BD-PSNR) are the metrics used for calculating the coding efficiency [32]. In order to obtain the BD-Rate and BD-PSNR values of the test sequence, the PSNR values and the bit rates obtained by encoding at the above mentioned quantization parameter values are used [32].

The percentage reduction in motion estimation time, denoted by \({\Delta } MET\) and the percentage reduction in number of search points (\({\Delta } N\)) of the proposed algorithm over TZS in the HM-16.6 encoder are given in (17) and (18), respectively.

$${\Delta } \text{MET}= \dfrac{\text{MET}_\text{HM}-\text{MET}_\text{Proposed}}{\text{MET}_\text{HM}} \times 100\,\%$$
(17)
$${\Delta } N= \dfrac{N_\text{HM}-N_\text{Proposed}}{N_\text{HM}} \times 100\,\%$$
(18)

where \(\text{MET}_\text{HM}\) and \(\text{MET}_\text{Proposed}\) are, respectively, the motion estimation times of the TZS algorithm in HM-16.6 and the proposed algorithm. Here, \(\text{MET}_\text{Proposed}\) includes the overhead time taken by the proposed algorithm and the motion estimation process. \(N_\text{HM}\) and \(N_\text{Proposed}\) are, respectively, the number of search points used for motion estimation in the TZS algorithm of HM-16.6 and the proposed algorithm. \({\Delta } \text{MET}_\text{uni}\), \({\Delta } \text{MET}_\text{bi}\) and \({\Delta } \text{MET}_\text{total}\) denote the percentage reductions in motion estimation time for uni-prediction, bi-prediction and both: \({\Delta } N_\text{uni}\), \({\Delta } N_\text{bi}\) and \({\Delta } N_\text{total}\) denote the corresponding percentage reduction in the number of search search points.

In this work, the constants \(k_{1}\) and \(k_{2}\) are set to 2 and 1, respectively. We have compared the performance of our proposed ESR algorithm with the DSR algorithm proposed by Nalluri et al. [24] in the fast search mode of the HM encoder. Table 4 shows the performance of the proposed ESR algorithm and the DSR algorithm compared to that of original TZS algorithm of HM-16.6 in the LD-P main profile. Our proposed algorithm has a motion estimation time of 44.91 % compared to TZS. The reduction in number of search points is 45.33 % for ESR whereas the reduction achieved in DSR is 27.59 % only. Though the reduction of motion estimation time is more, there is negligible loss in coding efficiency.

Table 4 Performance of the DSR algorithm and the proposed ESR algorithm compared to TZS in the HM 16.6 encoder in the LD-P main profile

Table 5 shows the performance of the proposed ISR-R algorithm implemented in the LD-P main profile of HM-16.6. Compared to TZS, the ISR-R algorithm has reduced the number of search points and motion estimation time significantly without effecting the coding efficiency. The number of search points and motion estimation time are reduced by 22.95 and 18.54 %, respectively.

Table 5 Performance of ISR-R algorithm compared to TZS in the HM 16.6 encoder in the LD-P main profile

Table 6 shows the performance of the proposed FBME algorithm implemented in the LD-B main profile in the fast search mode. The efficiency of FBME is compared with the efficiency of MAASR, proposed by Du et al. [23]. The MAASR algorithm is able to reduce the number of search points and motion estimation time in bi-prediction by 68.41 and 63.56 %, respectively. Our FBME algorithm has reduced the number of search points and motion estimation time by 79.80 and 76.06 %, respectively. Further, FBME has a smaller impact on the coding efficiency compared to that of the MAASR algorithm.

Table 6 Performance of the MAASR algorithm and the proposed FBME algorithm compared to BME in the HM 16.6 encoder in the LD-B main profile

In Table 7 the performance of the combined ESR and ISR-R algorithms is compared with TZSu implemented in the HM-16.6 encoder. Both the algorithms when combined are able to reduce the number of search points and motion estimation time by 48.88 and 47.62 %, respectively. In comparison TZSu is able to reduce the number of search points and motion estimation time by 41.87 and 38.88 %. Also, from Table 8 we see that the combination of ESR, ISR-R and TZSu is able to reduce the number of search points and motion estimation time by 60.23 and 58.46 %, respectively, with negligible loss of coding efficiency.

Table 7 Performance of the TZSu and the proposed ESR + ISR-R algorithm compared to TZS in the HM 16.6 encoder in the LD-P main profile
Table 8 Performance of ESR + ISR-R + TZSu algorithm compared to TZS in the HM 16.6 encoder in the LD-P main profile

Finally, the combination ESR, ISR-R and FBME algorithms is implemented in the LD-B main profile and RA main profile in the fast search mode. To evaluate the performance in uni-prediction and bi-prediction, the number of search points and motion estimation time are shown separately for uni-prediction, bi-prediction and both in Tables 9 and 10. In LD-B main profile, the combined algorithm has reduced the number of search points by 49.44, 80.10 and 60.58 % in uni-prediction, bi-prediction and both. Corresponding motion estimation times are reduced by 48.66, 75.75 and 59.58 %. BD-Rate and BD-PSNR are 0.1925 and −0.005 %. In [23], the total motion estimation time saving is less with a small loss of coding efficiency. In comparison, our proposed algorithms have more saving in the total motion estimation time with negligible loss in coding efficiency. In the RA main profile, the number of search points is reduced by 48.41, 82.08 and 59.55 % in uni-prediction, bi-prediction and both. Corresponding motion estimation times are reduced by 45.54, 78.35 and 57.71 %. BD-Rate and BD-PSNR are 0.2648 and −0.0083 %. From Tables 9 and 10, it is clear that the motion estimation complexity reduction is more for bi-prediction.

Table 9 Performance of ESR + ISR-R + FBME algorithms compared to TZS+BME in the HM 16.6 encoder in the LD-B main profile
Table 10 Performance of ESR + ISR-R + FBME algorithms compared to TZS+BME in the HM 16.6 encoder in the RA main profile

The proposed algorithms are implemented in the optimized version of the TZS algorithm. The ESR and ISR-R algorithms provide even higher saving in motion estimation complexity when implemented within the unoptimized version of the algorithm.

5 Conclusion

Complexity reduction tools for fast motion estimation in both uni-prediction and bi-prediction is proposed in this work. The proposed adaptive search range algorithm reduces the motion estimation time with negligible loss in video quality. ESR has reduced the motion estimation complexity with negligible loss of coding efficiency. ISR-R has reduced the complexity of motion estimation by reducing the raster search complexity of TZS. Bi-prediction fast motion estimation scheme has reduced the motion estimation complexity in bi-prediction without affecting the video quality and bit rate. The results obtained by the proposed algorithms are significant in terms of percentage reduction in number of search points and percentage reduction in motion estimation time. Experimental results illustrate that the total motion estimation time is reduced to a great extent while maintaining the coding efficiency as that of the HM-16.6 reference software in the fast search mode.

In the future, our goal is to develop an adaptive search range algorithm for uni-prediction that can provide a better trade-off between the motion estimation complexity and the compression efficiency.