1 Introduction

Compared with previous standards, H.264/advanced video coding (AVC) achieves significant compression gain by using various advanced techniques [7, 18]. Meanwhile, these techniques lead to higher computational complexity. One of the major improvements is the use of rate-distortion (RD) optimization for intra mode decision as well as inter mode decision, which check all the possible intra and inter modes to find the best coding result for achieving the highest coding efficiency. For intra prediction, it remains a challenging task to reduce its coding time while retaining the coding performance. The intra coding supports three marcroblock partitions: 4×4, 8×8, 16×16, with a certain number of prediction modes. Selecting fewer block-size types and prediction modes is normally a good solution to the task. As larger block sizes are fit for smooth regions whereas smaller block sizes are suitable for rich contents, block-size types are always reduced by thresholds yielded according to the coding content. However, things become complicated when efforts are made to reduce the prediction modes in each block-size type for the best mode that is not only with smallest distortion but also fewer coding bits. This is not straightforward to predict the required bitrate precisely, as the bitrate are obtained by complicated procedures: transformation, quantization, inverse transformation, inverse quantization, and entropy coding. In addition, the entropy coding of H.264/AVC involves context information which making the prediction more sophisticated [10]. Rate estimation is still under research and cannot be directly used for fast intra mode decision [5]. In this paper, we focus on edge detection based distortion prediction for realizing fast intra mode decision of H.264/AVC.

The use of edge detection in intra coding has been well elaborated in a number of literatures [2, 3, 6, 9, 13, 14, 16, 17, 19]. Most of the edge detection methods utilize the local property of video content to investigate the best prediction direction for intra coding. Varity of edge filters are designed in those methods. The accuracy of mode direction prediction depends on the edge filters. Thus it is important to set each direction with unique edge filter, which have been done in our previous work [2]. More details about the edge filters of those methods will be given in the Section 2.

In this work, we present an improved hybrid mode decision method to further reduce its computational requirement while maintaining its rate-distortion performance. In the proposed method, the edge filtered results are summed with the prediction residuals according to the modes they represent, which can select the probable modes efficiently and avoid the sort operations at the stage of individual pixel processing. Inspired by [6], we also incorporate MPM (Most Probable Mode) that is proved to be useful in the proposed method. Our experimental results show that the proposed algorithm outperforms its previous rivals in terms of coding efficiency.

The rest of this paper is organized as follows. In Section 2, we give a brief review of H.264/AVC intra coding and the related edge detection methods in the literatures. The proposed algorithm is introduced in Section 3. Experimental results and discussion are presented in Section 4, followed by a brief conclusion in Section 5.

2 Review of the H.264/AVC intra coding and related works

2.1 Intra coding scheme in H.264 standard

In this subsection, the intra coding of H.264/AVC [18] is briefly reviewed for supporting the presentation of the proposed method. For the luma samples, H.264/AVC supports three block-size types in high profile: luma 4×4 (I4MB), luma 8×8 (I8MB), and luma 16×16 (I16MB). I4MB supports eight directional modes as well as DC mode as shown in Fig. 1. I8MB shares the same prediction modes with I4MB except that the coding block size is 8×8. I16MB contains three modes with directions: vertical, horizontal, and plane. In this work, plane mode is simply treated as the diagonal down-left mode that is the mode 3 as shown in Fig. 1b. For the chroma intra prediction, only 8×8 block (C8MB) is supported and the prediction modes are the same as those of I16MB.

Fig. 1
figure 1

a A 4 ×4 block and its referenced neighbor pixels. b Eight directional modes for I4MB and I8MB prediction

2.2 Related works on the design of edge filters

In [13], Sobel operator that comprises two 3×3 filters as shown in Fig. 2a is convolved with all the pixels to create an edge direction histogram. For Intra 4×4, the mode with the maximum amplitude in the histogram together with its neighbor modes and DC mode are selected for intra coding. In [3], Sobel operator is replaced by Roberts cross as shown in Fig. 2b and the number of pixels processed is reduced by the cross. These two algorithms introduce the widely used edge detection approach to H.264/AVC coding. However, they do not involve the intra prediction scheme of H.264/AVC. A gravity approach is introduced in [4], in which the direction of the block’s gravity center is used to indicate the candidate modes. The pixels of a block are considered as a set of material points and the intensities of these pixels are regarded as pixel mass to generate a unique location that denoted as the gravity center of the block. Figure 3 shows the edge filters with size of 2×2 used in [17]. Unlike filters used for seeking the minimal value, the mode of maximal value is chosen in a set of filters that are perpendicular to the prediction directions. When the maximum corresponds to the 5th filter operation, all the possible modes should be calculated. These filters are applied only once to a pseudo 2×2 block. Tsai et al. [16] present an intensity gradient approach for intra prediction with use of the filters as shown in Fig. 4. Four filters are designed for the first four prediction directions along the prediction orientations while the other four are generated through being set to be the average of their two neighbor modes. The filters are calculated with the central four pixels in a 4×4 block, such as pixels f, g, j, and k of the block in Fig. 1a. However, it is insufficient to fully extract the entire block information with only four pixels. The algorithm introduced in [14] handles the filters with all the 16 pixels while the filters shown in Fig. 5 are slightly changed as well. When processing all the pixels to a 4×4 block, the most probable directions are determined through a set of sort operations. In our previous work [2], we build the correspondence between modes and filters which is shown in Fig. 6. Unlike the filters in [14, 16], a unique filter is applied to each directional mode to eliminate the correlation among neighbor modes. However, we adopt a computational expensive scheme, which is the same as that in [14]. The recently proposed fast decision methods have demonstrated how to reduce the block-size types as well as prediction modes. For instance, in [19], a hierarchical intra mode decision method is proposed based on the sum of absolute Hadamard transform difference (SAHTD) and quantization parameter (QP) dependent threshold. The algorithm in [6] utilizes the variance-based classification of texture complexity to make the block-size type selection and combines the fast mode algorithm in [17] with MPM to make the fast mode decision. Lee and Jang [9] propose another block-size type selection method in which not only the variances of input pixels but those of prediction residual are taken into account. Prediction residual derived from intra 16×16 coding is supposed to be related to the RD performance. Thresholds are carefully determined for precise selection. For the mode selection part, the principle used is normally similar to the existing fast mode decision methods.

Fig. 2
figure 2

Edge operators used in [3, 13]. a Sobel and b Roberts

Fig. 3
figure 3

Five filters of edge operator used in [17]. a Vertical. b Horizontal. c Diagonal down-left. d Diagonal down-right. e Nondirection

Fig. 4
figure 4

Filters used in [16]. ad are the filters for the first four directional modes

Fig. 5
figure 5

A slightly modified version of filters in [16] used in [14]

Fig. 6
figure 6

Filters used in the hybrid method [2]. ah correspond to the eight directional modes (m0–m8) supported in luma 4×4 and 8×8 block

2.3 The hybrid method

Recently, edge filters are adopted in the hybrid method and the filters used in [2] are shown in Fig. 6. These filters are performed in a different manner as compared with Sobel operator used in [13]. Take pixel f of Fig. 1a as an example. When Sobel operator is applied, the degree of difference in vertical direction D x is represented as

$$\label{SobelValues} D_x = |a - i + 2 \times (b - j) + c - k| $$
(1)

For Sobel operator, the subtraction operations are first performed along the direction and then the sum is obtained. The absolute value of the sum represents the directional mode. To better represent the dominant edge, a calculation order is adopted in [16]. That is, the absolute value of the subtraction along the direction is obtained first, and the mean of all the absolute values is the value for comparison. For instance, when the filters as shown in Fig. 6 are applied, the vertical mode indicated by D 0 is defined by

$$\label{values} D_0 = \frac 12 (|a - i| + |c - k|) $$
(2)

The pixels involved in this process are shown in Fig. 7. These pixels are processed with the edge filters to conduct the values corresponding to directional modes. These values are summed with the prediction residuals. Each pixel in the block has the same number of directional prediction values generated by prediction process and the prediction residuals can be obtained by subtracting them with their original values.

Fig. 7
figure 7

Pixels selected in the hybrid method [2]. a 4×4, b 8×8, and c 16×16

The prediction modes selected for intra coding are listed in Table 1. Take I4MB as an example, the mode decision depends on a statistical scheme. Each pixel will conduct a serial of values corresponding to directional modes. Small value means there is less variance along that direction and it should be selected. Sort operation is adopted to obtain two minimal values and the modes they represent are counted. After processing all the pixels, the three modes with the highest frequency of appearance are chosen for intra coding along with the DC mode. Other block-size types are performed in the similar way. In the hybrid method, 3 out of 8 directional modes are selected for I4MB and I8MB while only one directional mode is selected for those block-size types with 3 directional modes. As for chroma prediction, the same prediction mode is applied to the two 8×8 blocks (U and V). If the result of each block calculated is different, only DC mode is set to be the selected mode, which can further reduce the coding time. For blocks at the frame boundary where the left or upper block may not be available, only those prediction modes available are taken into consideration.

Table 1 The number of selected modes of each algorithm

3 The proposed method

To derive useful information from local blocks, it is preferable to choose filters with smaller sizes. For I4MB and I8MB with 8 directional modes, the filter size is at least 3×3 if a unique filter is set for each mode. In this work, the edge filters shown in Fig. 6 are chosen through comprehensive experiments on filter selection for each block-size type. The first three filters are used to represent directional modes (i.e. horizontal, vertical, and plane) for I16MB and C8MB. Plane mode is considered as diagonal down-left mode.

Hybrid method has shown to achieve better performance than conventional edge detection methods by adapting these filters. In this work, not only this advantage is utilized, several other improvements are also investigated. First, the sort operations at the stage of individual pixel processing are time consuming and summing method is used instead. Second, as we adapt computation efficient summing method, more pixels can be applied to improve the filtering results with negligible time increase. Finally, local information is exploited to enhance the mode selection accuracy. These improvements are illustrated in details in the following subsections.

3.1 Applying the filters to coding blocks

After generating the filters for different marcroblock partitions, it comes to the problem of how to apply them to pixel level. Intuitively, it should filter all the pixels within the coding block. However, this will significantly increase the computational requirement. Our empirical experimental results show that it is a better choice to apply the filters to several distributed pixels that cover most of the block region, i.e., the gray regions in Fig. 8. Each pixel will conduct with one filter data for each directional mode. When all selected pixels are computed, the statistical data of each directional mode are obtained. Then it is crucial to process these data for accurate decision. In [2], two directional modes with the least values are chosen from each pixel. After that, the three directional modes that appear most frequently in all the calculated pixels are finally selected for intra coding. In I4MB, when edge filters are applied to one pixel, 8 filtering values can be generated and each value corresponds to one prediction direction. Small value means there is less variance along that direction and it should be selected. Thus, these 8 values are sorted and the first two minimal ones are chosen, which are the direction modes they represent will be counted. However, the sort operation tends to ignore the variance of those uncounted values. When there are noise-like pixels, sort operation takes them as normal pixels and as a result, their influence on mode decision cannot be revealed. Moreover, their filtering results will play more important roles than those of other normal pixels through summing process. It is noticed that when summing all the data together according to the modes they represent, better performance can be achieved and the sort operations can be avoided during individual pixel processing.

Fig. 8
figure 8

Pixels used for calculation with filters. a 4×4, b 8×8, and c 16×16

As for the selected pixels shown in Fig. 8. although up to 64 pixels are chosen for I16MB, the additional computational load is negligible with the summing method. In summary, the selection of directional modes in this work can be described as following:

First, the edge filters are applied to the selected pixels for n ( n = 1,2,..., N) in order to obtain the filtered result of \(D^n_m\). The prediction residual \(P^n_m\) with m = 1,2,..., M can also be acquired, where N represents the number of selected pixels and M is the number of directional modes used in this block.

After that, values of all the selected pixels are accumulated and the mean of them is summed as:

$$\begin{array}{rll} H_m &=& \frac 1N \sum\limits^N_{n=1} D^n_m + \frac 1N \sum\limits^N_{n=1} P^n_m \\ \text{for} \quad m &=& 1,2,\dots, M \quad {\rm and} \quad n = 1, 2, \dots, N \end{array}$$
(3)

It is more probable for the directional modes with lower H to be selected for intra coding. Prediction residuals are also involved in this work as they can reduce the degradation in coding performance, which is partly due to the intra prediction scheme in H.264/AVC. Although prediction modes with directional features serve as means for edge detection, the prediction processes are not identical for all the directions. For example, both the upper and the left reconstructed reference pixels can be used in mode 4 whereas mode 3 only uses the upper reference pixels as illustrated in the H.264/AVC standard. When the upright block is unavailable, more than half the pixels in the coding block are predicted only by the last one of the upper reference pixels. As a result, it is insufficient for the original pixels alone to represent the prediction mode in that direction for comparison. The directional mode with the minimal value should have smaller filtering result along the direction as well as prediction residual.

3.2 Combined with local information

In the conventional edge detection based fast mode decision algorithms, DC mode is taken as the default mode and it is always computed in intra coding. Recently, MPM is introduced to utilize the local relationship between neighbor blocks. Moreover, the belief propagation technique to infer the potentially best mode of current block is adopted in [12], in which the best mode of coded block is refined by passing the information forward and backward among the local blocks. In our proposed method, MPM is fully exploited to instruct the mode selection procedure. For the blocks with all prediction modes available in I4MB and I8MB, the MPM can be any of the nine prediction modes, which is generated through the comparison of the best prediction modes of the upper and left blocks. Then fast mode decision is made according to MPM. If MPM has high probability to be the best mode, fewer modes are employed for intra coding; otherwise more candidate modes should be considered for prediction accuracy. In other words, MPM will be firstly selected while other modes will be chosen or not concerning their importance to MPM. The relationship between MPM and the best mode in the case of I4MB is given in Table 2. Similar properties can be found in I8MB and we take I4MB as an example for convenience. In the Table 2, H i (i = 0...7) depicts the sorted directional modes obtained by (3). BM represents the best mode determined by H.264/AVC. From this table, we can find that H 0 is the best choice for nearly half of those blocks. The larger the filtering result H is, the less possibility the mode has to be the best one. In [6], three directional modes are always selected and DC mode is chosen only when MPM is one of the three directional modes. It is not an appropriate scheme for the edge filtering results as it is shown in Table 2 that DC mode is more frequently chosen to be the best mode than H 2. Moreover, to further reduce the coding time while not degrading the coding performance, an adaptive number of selected modes are more suitable than the fixed four candidate modes. For example, if MPM is H 0, then it is highly possible to be the best mode and fewer modes will be selected. The situation when MPM fails to be the best mode is studied, which is shown in Table 3. It indicates other candidate modes that should be considered and the column shows the amount of other modes as the best mode. The number of selected modes for each condition is described as follows.

  1. (1)

    MPM is H 0. Over one third of those blocks with different best modes are H 1. As a consequence, to further reduce the coding time, no more other modes will be selected in this case to save much time with little performance degradation. From Table 3, we can see that H 0 is highly recommended for selection. Therefore, when MPM is other modes, H 0 is still selected.

  2. (2)

    The second largest amount of MPM is DC. Unlike (1), nearly half of MPM are not the best choices in sequence “Foreman” and the amount becomes even smaller when video material is with plenty of details such as “Mobile”. Thus more directional modes should be selected and besides H 0, H 1 is included. Considering the large amount of MPM in this case, H 2 is not selected for computation time’s sake.

  3. (3)

    MPM is H 1. In this case, most of the best modes are H 0 if are not MPM. We make the similar choice to (2), selecting H 0, H 1, and DC for intra coding.

  4. (4)

    MPM is H 2. Directional modes ahead of it will be chosen besides DC, which is the same as [2, 6].

  5. (5)

    MPM is of other directional modes. In this case, three modes, i.e. H 0, H 1, and H 2 are taken into account together with DC and MPM. Although an additional one is selected compared with the method in [6], more coding gain is achieved with relatively little coding time increase.

The 8 directional modes for I4MB and I8MB mode selections are listed in Table 4. They are changeable for different requirements. For faster intra coding, the selected modes of those blocks with MPM equal to H i (i = 3...7) can be further reduced; for less coding performance loss, H 2 can be included among those blocks when MPM is equaled to H 1 or DC. Mode selection for I16MB and C8MB is the same as hybrid method. For I16MB, only one directional mode will be selected according to the filtering results in addition to DC mode. In the case of C8MB, if U and V conduct the same directional mode, then this mode along with DC mode will be selected. Otherwise, only DC mode will be computed.

Table 2 The number (Num) and percentage (%) of each mode to be the best mode (BM), MPM, and both of most probable mode and best mode (MPBM) in case of I4MB and QP = 28 for encoding 300 frames
Table 3 Modes that is the best mode but not MPM in the case of I4MB for the “Foreman” sequence in QCIF format
Table 4 Other selected modes based on MPM for I4MB and I8MB

4 Experimental results and discussion

The proposed algorithm was implemented on JM 18.3 provided by JVT [8]. The experimental settings are based on recommended simulation common conditions provided in [15]. More details of the configurations are as follows.

  1. (1)

    All the frames are encoded as intra frames.

  2. (2)

    RD optimization and CABAC are enabled.

  3. (3)

    QPs are set to be 22, 27, 32, and 37.

  4. (4)

    RD optimization quantization and fast chroma prediction are turned off.

The performance evaluation is made with respect to the JM reference software based on the following performance measures: the peak signal-to-noise ratio (PSNR) changes ΔPSNR (in decibel), the total bitrate (BR) increases ΔBR (in percentage), and the time saving ΔT (in percentage). The average PSNR value of luma (Y) and chroma (U, V) is used, which is based on the equations below [13]:

$$\label{psnr} \overline{PSNR}=10\log_{10}\left(\frac{255^{2}}{\overline{MSE}}\right) $$
(4)

where the average mean square error (MSE) is given by

$$\label{yuv} \overline{MSE}=\frac{4 \times MSE_{Y} + MSE_{U} + MSE_{V}}{6} $$
(5)

The summing method depicted in (3) is first compared with the sort based method in [2] using same edge filters and other test conditions. The comparison results are listed in Table 5, which show that the summing method can achieve better RD performance with lower computational requirement as compared with the sort based method. Consequently, we incorporate the summing method into the proposed method for obtaining better coding performance.

Table 5 The comparison of sort based approach and summing method

Table 6 compares the proposed method with Tsai et al.’s method [16], Wang et al.’s method [17], Huang et al.’s method [6], gravity method [4], Lee et al.’s method [9], and hybrid method [2]. The experimental results show the proposed algorithm can achieve around 0.09dB PSNR gain, 2 % bit rate reduction, and 4 % encoding time increase compared with [4]. Based on the statistic of Table 1, it can be seen that the time reduced by gravity method is primarily due to the fact that only one directional mode is selected in I4MB. Although it further reduces the encoding time, it inevitably degrades the coding gain as well. The proposed method also can achieve 0.03dB PSNR gain, 1 % bit rate reduction, and similar encoding time reduction compared with [9]. The proposed method outperforms Tsai et al.’s method [16] in all respects. In comparison with Wang et al. [17] and Huang et al. [6], the proposed method achieves more time reduction, less PSNR drop and bitrate increase. Also, the proposed method outperforms the previous work [2], which further demonstrates the efficiency of using summing and MPM techniques in the proposed method. In general, the proposed method is more preferable for intra mode decision. In addition, the performances of the proposed algorithm under different QPs are listed in Table 7. TR measured in second represents time reduction that the proposed method achieved. The RD curves for “Coastguard” and “Station2” sequences are depicted in Fig. 9 [1]. The figure demonstrates that the curves of [9] and hybrid method are very close to each other while the proposed method is closer to the JM standard than all the other methods, especially in the case of coding 1080p sequence, which illustrates that the proposed method is suitable for High Definition (HD) sequences.

Table 6 Performance comparison
Table 7 The performance of proposed method with QP = 22, 27, 32, 37
Fig. 9
figure 9

RD curves compared to existing algorithms for sequences: a Coastguard in CIF, and b Station2 in1080p

It is known that the intra-mode in H.264/AVC, which utilizes spatial prediction and is transformed by discrete cosine transform (DCT), is comparable to wavelet-based still image coding standard Joint Photographic Experts Group (JPEG) 2000 [11]. Hence, the proposed improved block-based hybrid coding scheme is of significant benefit to still-image coding as well.

5 Conclusion

In this paper, an improved hybrid fast intra mode decision method for H.264/AVC is presented, in which the computational consuming sort procedure is replaced by summing method and local information is incorporated into MPM. Extensive ex-perimental results show that the proposed method can achieve 77 % total encoding time reduction with only 0.05 dB PSNR degradation and 0.4 % bit rate increase on average. This performance is more efficient than most of the well-known fast intra mode decision algorithms for H.264/AVC.