Perceptual sensitivity-based rate control method for high efficiency video coding

Zeng, Huanqiang; Yang, Aisheng; Ngan, King Ngi; Wang, Miaohui

doi:10.1007/s11042-015-2997-3

Perceptual sensitivity-based rate control method for high efficiency video coding

Published: 20 October 2015

Volume 75, pages 10383–10396, (2016)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Multimedia Tools and Applications Aims and scope Submit manuscript

Perceptual sensitivity-based rate control method for high efficiency video coding

Download PDF

Huanqiang Zeng¹,
Aisheng Yang¹,
King Ngi Ngan² &
…
Miaohui Wang²

667 Accesses
24 Citations
Explore all metrics

Abstract

The newest video coding standard--high efficiency video coding (HEVC) optimizes its coding efficiency in terms of sum of square error (SSE), which does not fully consider the perceptual characteristic of the input video. Thus, the HEVC is not effective in the sense of perceptual video coding. To address this problem, an efficient perceptual sensitivity-based rate control (PSRC) method for HEVC is proposed based on the human visual system (HVS) observation that the region with less perceptual sensitivity can tolerate more distortion. In the first stage, the proposed method develops a perceptual sensitivity measurement to evaluate the perceptual sensitivity of each coding tree unit (CTU) and each frame, which is then used to guide the bit allocation so that more bits will be allocated to those regions with higher perceptual sensitivity. To meet the target bits, an improved R-λ model is utilized to determine the quantization parameter (QP). Experimental results have shown that the proposed method is able to improve the perceptual coding performance, compared with the original rate control in HEVC.

Perceptual importance analysis-based rate control method for HEVC

Article 19 February 2022

Perceptual feature guided rate distortion optimization for high efficiency video coding

Article 16 March 2016

Improving compression efficiency of HEVC using perceptual coding

Article 18 November 2020

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

High efficiency video coding (HEVC) is the newest video coding standard [21, 24], which is a successor of the H.264/AVC [10] and developed by joint collaborative team on video coding (JCT-VC) standard committee formed by video coding experts group (VCEG) and moving picture experts group (MPEG). Compared with H.264/AVC, HEVC is able to double the compression efficiency while keeping almost the same subjective quality of the reconstructed video. The improvement of coding efficiency is achieved by exploiting many new techniques, such as coding unit (CU) with variable block sizes varying from 4 × 4 to 64 × 64, prediction unit (PU), transform unit (TU), quad-tree structure, advanced motion vector (MV) prediction, and so on [28]. Due to its high coding performance, the HEVC is expected to be used in various applications, such as HDTV, digital satellite broadcasting, and so on.

As we known, rate control plays an important role in video coding with the goal of maximizing video quality under the constraints of the bit rates and codec buffer in a robust and accurate manner. Similar to H.264/AVC, the HEVC also adopts Lagrangian rate distortion optimization (RDO) [29] to identify the optimal encoding setting for achieving the optimal rate distortion (RD) performance. However, the HEVC still optimizes the coding performance in terms of sum of squared error (SSE), which is not well correlated to the perceptual quality as discussed in [9]. Moreover, since the video quality is ultimately judged by human eyes, it is very essential to develop a perceptual rate control method for the HEVC with the goal of providing the optimal perceptual quality under a given bit rate.

In this paper, an efficient perceptual rate control method for the HEVC, called the perceptual sensitivity-based rate control (PSRC), is proposed by considering the perceptual characteristics of the input video content. In our approach, a perceptual sensitivity measurement is firstly developed to evaluate the perceptual sensitivity for each coding tree unit (CTU) and each frame. Then, the bit allocation is guided by the obtained perceptual sensitivity and an improved R- λ model is exploited to determine the quantization parameter (QP) for meeting the allocated target bits. Experimental results have demonstrated that the proposed PSRC method is able to improve the perceptual coding performance, compared with the original rate control in HEVC.

The rest of this paper is organized as follows. The related works are briefly reviewed in Section 2. The proposed perceptual sensitivity-based rate control, PSRC, method for HEVC is presented in Section 3 in detail. Extensive simulation results are documented and discussed in Section 4. Finally, the summary is drawn in Section 5.

2 Related works

There have been many works on rate control for previous video coding standards (e.g., H.264/AVC) [5]. In these methods, the common way is to establish the rate distortion/quantization model based on the characteristics of the residual or the input video and then use this model to determine the suitable QP. Among them, considering the predicted residual follows the Laplacian distribution, Chiang et al. [6] presented a quadratic rate distortion model that uses the mean absolute difference (MAD) to estimate the complexity of the input video. Liu et al. [16] presented a switched MAD prediction scheme to reduce the MAD abruption and a linear rate quantization model to describe the relationship between the bits and the QP. Kwon et al. [11] presented a rate control method for H.264, in which the inter dependency between RDO and rate control is addressed by QP estimation and update, and the bits for coding header information are estimated as a function of the nonzero MV elements. An et al. [1] suggested an iterative RDO method for H.264 by using the primal-dual decomposition and sub-gradient projection. Dong et al. [8] suggested a context-adaptive model parameter prediction scheme by using the spatial and temporal correlations so that the accuracy of the MAD, model parameter and bits matching could be significantly improved. Tsai et al. [23] improved the rate control performance of intra coding by applying a Taylor series and scene change aware-based rate quantization step size model. However, these methods are not applicable to the HEVC, because HEVC exploits more complicated quad-tree coding structure with variable-sized CUs, PUs and TUs, which is different from that of previous video coding standards. Therefore, it is very necessary and more challenging to develop a novel rate control algorithm for HEVC.

For that, multiple rate control methods for HEVC have been recently proposed. Choi et al. [7] presented a quadratic pixel-based unified rate quantization model for HEVC based on the number of pixels by considering the size of PU varies from CU to CU. Based on the observation that the Lagrangian multiplier is a more important parameter to achieve the target bits than QP, Li et al. [13] suggested a rate-λ model to perform rate control for HEVC instead of the conventional rate-quantization model. It should be pointed out that this rate-λ model has been incorporated to the HEVC reference software. Moreover, Lee et al. [12] presented a frame-level rate control scheme for HEVC based on the texture and non-texture models. More specifically, the texture model is established by using the transform residual that is categorized into three accounts: low, medium and high-textured CUs, while the non-texture model is developed by considering the different characteristics of the non-texture bits in various depths of CUs. By taking into account the inter-frame dependency between the coding frame and its reference frame, Wang et al. [26] proposed an inter-frame dependency based rate and distortion models. Based on these models and a mixed Laplacian distribution of residual information, a new ρ-domain frame-level Rate-group of picture (Rate-GOP) based rate control is presented.

Unfortunately, the above-mentioned rate control methods for H.264/AVC and HEVC do not consider the human visual system (HVS) characteristic and might be inefficient in the sense of perceptual video coding. Since the video quality is ultimately judged by the human eye, how to develop a perceptual rate control method by incorporating the HVS characteristics attracts more and more attentions from both the academic community and industry. Although there are some existing perceptual rate control methods for H.264/AVC (e.g., [18, 27]), they cannot be directly applied to HEVC due to the different coding structures. For the HEVC, considering saliency represents the probability of human attention, Li et al. [14] incorporated the graph-based visual saliency into the quantization control so that the larger QP will be assigned to the CU with lower probability of attention. Li et al. [15] developed a weight-based unified rate quantization (URQ) scheme for perceptual video coding of conversational video in HEVC. Based on the observation that human is usually attracted by the face in conversational video, a hierarchical perceptual model of face is used to compute the weight map, which is then utilized to guide the bit allocation. In general, the perceptual rate control method for the HEVC has not been well investigated. Therefore, we focus on developing a perceptual rate control method for the HEVC.

3 Proposed Perceptual Sensitivity-Based Rate Control (PSRC) for HEVC

The proposed PSRC method consists of three parts as described in the following three sub-sections: 1) the perceptual sensitivity measurement is developed to evaluate the perceptual characteristic of the input video; 2) Bit allocation is performed based on the obtained perceptual sensitivity; and 3) An improved R-λ model is utilized to meet the target bits, including QP determination and parameters update.

3.1 Perceptual sensitivity measurement

Obviously, the development of perceptual sensitivity measurement for guiding the video coding process shall consider the following requirements. First, the perceptual sensitivity measurement shall be able to well describe both the HVS perception and video coding distortions (e.g., quantization, artifacts). Second, the perceptual sensitivity measurement shall have low complexity and be easy to be incorporated into the video coding process (e.g., rate control). Therefore, although there are many existing visual quality assessment metrics that have good measurement of HVS perception [19, 20], they are not applicable to video coding process. As we know, the conventional video codec optimizes the RD performance in terms of SSE, which is computational efficient. However, the correlation between SSE and HVS perception is poor [29]. Further study finds that there is a strong linear relationship between the mean squared error (MSE) and HVS perception [2, 22].

Motivated by this, we propose a perceptual sensitivity measurement as a function of MSE, which can not only better indicate the HVS perception than MSE but also be easier to be incorporated into the video codec. For each CTU, the proposed perceptual sensitivity measurement (PSM) is defined by weighting the MSE based on its perceptual characteristic and the PSM of the whole frame can be computed by simply summing the PSM of each CTU in the frame:

$$ \begin{array}{c}\hfill PS{M}_i=1+{k}_i\times MS{E}_i\hfill \\ {}\hfill PSMF={\displaystyle \sum_{i=1}^NPS{M}_i}\hfill \end{array} $$

(1)

where MSE _i is the mean squared error between the original and reconstructed i-th CTU, N is the number of CTU in the current frame f, k _i is the perceptual weighting factor to indicate the perceptual characteristic of i-th CTU. Intuitively, the visibility of distortion in a video scene depends on its spatial texture and temporal motion. As a result, the perceptual weighting factor k _i is computed by considering both spatial texture and temporal motion. First, for the spatial texture, in general, from the spatial viewpoint, highly spatial texture region can tolerate more distortion than the low/medium spatial texture region. Hence, the spatial texture complexity (STC) is developed as the spatial perceptual feature for each CTU to evaluate its texture complexity:

$$ ST{C}_i=\frac{\sigma_m}{\sigma_f}\times \frac{1}{\sigma_i} $$

(2)

where σ _i is the variance of the i-th CTU in current frame f, σ _f is the variance of the current frame, and σ _m is the variance of the mean value of all the CTUs in the current frame, which can be computed by:

$$ {\sigma}_m=\frac{1}{N}{\displaystyle \sum_{i=1}^N{\left({m}_i-{\displaystyle \sum_{i=1}^N\frac{m_i}{N}}\right)}^2} $$

(3)

where m _i is the mean value of the i-th CTU in the current frame f, N is the number of CTU in the current frame. This STC _i makes full use of two texture masking properties: global smoothness $ \frac{\sigma_m}{\sigma_f} $ and local contrast $ \frac{1}{\sigma_i} $. One can see that the smaller the STC value is, the larger the variance is, which means that the current CTU is more likely to contain more complex texture information.

Second, for the temporal motion, from the temporal viewpoint, people are more interested in a moving region than a stationary region. Therefore, temporal motion activity (TMA) is presented as the temporal perceptual features for each CTU, namely, using the MV length to evaluate the motion activity. As suggested in [17], once the speed of the moving object is very fast that exceeds the spatial-temporal resolution capacity of humans, the moving regions will be smoothed and thus cause motion blurring. For these motion blurring regions, people always cannot perceive good visual quality and might ignore these regions. Therefore, the TMA _i for each CTU can be computed as:

$$ TM{A}_i=\left\{\begin{array}{cc}\hfill 1\hfill & \hfill \mathrm{if}\ \left(\left|{x}_i\left|+\right|{y}_i\right|\right)>L;\hfill \\ {}\hfill \sqrt{x_i^2+{y}_i^2}+1\hfill & \hfill \mathrm{otherwise}.\hfill \end{array}\right. $$

(4)

where MV _i = {x _i, y _i} is the MV of the i-th CTU in the current frame, and L is empirically-determined as 8 from the extensive experiments. We can see that a smaller TMA value indicates that the CTU has lower motion activity, especially TMA = 1 means that the CTU is likely to fall in a motionless region or a motion blurring region.

Based on the above analysis, for the current CTU, we can individually obtain two perceptual features—spatial texture complexity (i.e., STC) and temporal motion activity (i.e., TMA). In order to make the spatial STC and temporal TMA contribute equally to the evaluation of perceptual distortion, the simple strategy by using the product of these two perceptual features is exploited to compute perceptual weighting factor, k _i:

$$ {k}_i=ST{C}_i\times TM{A}_i $$

(5)

It should be pointed out that the proposed perceptual sensitivity measurement in (1) can well indicate the texture masking property of HVS. To be more specific, if the distortion with the same MSE occurs in both the complex texture and smooth CTUs, the perceptual quality reduction in the complex texture CTUs tends to be smaller than that in smooth one; similarly, if the distortion with the same MSE encounters in both the moving and stationary CTUs, the perceptual quality reduction will be easier to be perceived in the moving regions.

3.2 Bit allocation guided by perceptual sensitivity

For the perceptual rate control, the key problem is how to allocate the bits for each frame and each CTU based on their perceptual sensitivity, which can be computed by using the PSM developed in Section 3.1. And bit allocation is performed as the order of the GOP level, frame level, and CTU level as follows.

1)
Pre-analysis process

Before bit allocation, we perform a pre-analysis process to obtain the PSM of each CTU and frame. In this pre-analysis process, by using only one nearest previous frame as the reference frame, only the 2N × 2N partition mode for CTU is performed to get the MV for each CTU, and the MSE is computed between the original CTU and its best matching one. In this process, the spatial texture complexity STC _i and temporal motion activity TMA _i for each CTU can be computed by using (2) and (4), respectively. Then, the PSM for each CTU PSM _i and each frame PSMF can be computed by using (1). Note that the first video frame is a special case that there is no reference frame for performing the motion estimation and thus the PSM _i are equal and directly set as 1 for all the CTUs in the first video frame.

2)
GOP level bit allocation

Note that the GOP level bit allocation is conducted by using the commonly-used way as suggested in [13]. Suppose R _T is the target bit rate, the frame rate is f, N _GOP is the GOP length, N _coded is the number of frame that has been coded, SW is the size of smooth window, which is used to make the bit rate change smoother, R _used means the bits that have been used. First, the average bits for each frame can be simply computed as:

$$ {R}_{FraAvg}=\frac{R_T}{f} $$

(6)

Then, the target bits for each frame can be computed as below, consisting of two items: the average bits per frame and the buffer status.

$$ {T}_{AvgFra}={R}_{FraAvg}+\frac{R_{FraAvg}\cdot {N}_{coded}-{R}_{used}}{SW} $$

(7)

Finally, the target bits for the current GOP is

$$ {T}_{GOP}={N}_{GOP}\cdot {T}_{AvgFra} $$

(8)

3)
Frame level bit allocation

Frame level bit allocation is performed after obtaining the bits for the current GOP. One can easily perceive that different frame in a GOP has different influence in the subjective quality of the whole GOP. Considering that the frame with higher perceptual sensitivity shall be assigned with more bits, the bits are allocated for each frame based on its perceptual sensitivity PSMF computed in (1). Therefore, the target bits of the j-th frame in the current GOP can be determined as:

$$ {T}_{Fra}^j=\frac{T_{GOP}-{R}_{used}^{GOP}}{{\displaystyle \sum_{NotCodedFrames}PSM{F}_i}}PSM{F}_j $$

(9)

where R ^GOP_used means the bits that have been used for coding the frames in the current GOP and PSMF _j is the weight of each frame in the current GOP. One can see that the higher perceptual sensitivity of the frame is, the more bits will be assigned.

4)
CTU level bit allocation

Similarly, different CTU in a frame has different effect on the subjective quality of the whole frame. Hence, the target bits of current CTU T _CurrCTU are allocated according to its perceptual sensitivity computed in (1) as follows.

$$ {T}_{CurrCTU}=\frac{T_{CurrFra}- Bi{t}_{header}- Code{d}_{Fra}}{{\displaystyle \sum_{NotCodedCTUs}PS{M}_i}}\cdot PS{M}_{CurrCTU} $$

(10)

where Bit _header is the estimated bits of all headers, including slice header, MVs, prediction modes, etc., which is estimated according to the actual header bits of previous coded pictures belonging to the same level.

By using the above-mentioned bit allocation guided by perceptual sensitivity, one can see that the frame and the CTU with higher perceptual sensitivity value will be assigned with more bits so that the perceptual visual quality of the reconstructed video can be improved.

3.3 An improved R − λ model

After the CTU level bit allocation, the question becomes how to determine the QP so as to meet the target bits for each CTU. For simplicity, the R- λ model [13] is used, as it has higher accuracy than the traditional rate-quantization model:

$$ \lambda =\alpha \cdot bp{p}^{\beta } $$

(11)

where α and β are the model parameters that are related to the characteristic of the input video. Hence, different CTUs shall have different α and β values. The initial values of α and β are empircialy-determined as 3.2003 and −1.367, respectively. Note that the initial values of α and β are not critical, since they will keep updating in the encoding process. The update principle is based on the assumption that the collocated CTUs in different frames but belong to the same frame level may share the same parameters α and β. And bpp means the bits per pixel, which can be computed as:

$$ bpp=\frac{T_{CurrCTU}}{w\cdot h} $$

(12)

where w and h are width and height of the CTU, respectively.

One can see that the target bits T _CurrCTU for each CTU can be computed by (10) and T _CurrCTU is then used to compute bpp by (12). Finally, the λ value can be derived by (11) and is then exploited to determine the QP value according to the following equation (13).

$$ QP=6.3256 \ln \lambda +21.8371 $$

(13)

To keep the visual quality consistency, the λ value and the determined QP value for the current CTU are clipped in a narrow range, as follows.

$$ \begin{array}{c}\hfill {\lambda}_{lastCTU}\cdot {2}^{\frac{-1.0}{3.0}}\le {\lambda}_{currCTU}\le {\lambda}_{lastCTU}\cdot {2}^{\frac{1.0}{3.0}}\hfill \\ {}\hfill Q{P}_{lastCTU}-3\le Q{P}_{currCTU}\le Q{P}_{lastCTU}+3\hfill \end{array} $$

(14)

where λ _currCTU is the λ value of the current CTU, λ _lastCTU is the λ value of previous encoded CTU, QP _currCTU is the QP of the current CTU and QP _lastCTU is the QP of previous encoded CTU.

In order to adapt to the characteristic of the input video, the parameters need to be continually updated during the encoding process. In this work, there are some parameters: the perceptual sensitivity of each CTU PSM _i and each frame PSMF, α and β. The former two parameters will be updated in a GOP period in the pre-analysis process described in Section 3.2. The latter two parameters, α and β, will be updated by using the real encoded bpp (i.e., bpp _real), the real used λ value (i.e., λ _real) and the perceptual sensitivity values as:

$$ \begin{array}{l}{\lambda}_{comp}={\alpha}_{old}\cdot {\left(\frac{bp{p}_{real}}{PS{M}_i}\right)}^{\beta_{old}}\\ {}{\alpha}_{new}={\alpha}_{old}+{\delta}_{\alpha}\cdot \left( \ln {\lambda}_{real}- \ln {\lambda}_{comp}\right)\cdot {\alpha}_{old}\\ {}{\beta}_{new}={\beta}_{old}+{\delta}_{\beta}\cdot \left( \ln {\lambda}_{real}- \ln {\lambda}_{comp}\right)\end{array} $$

(15)

where λ _comp is the λ that is computed from the old model parameters by using bpp _real value and the PSM value, δ _α and δ _β are empirically set to be 0.1 and 0.05, respectively. Note that the final values of α and β are clipped in a pre-determined range [0.05, 20] and [−3.0, −0.1] as suggested in [13], respectively.

3.4 Proposed PSRC method

In summary, the proposed PSRC method can be described as below:

4 Experimental results and discussions

To evaluate the performance, the proposed perceptual sensitivity-based rate control (PSRC) method is incorporated into the HEVC reference software (i.e., HM10.0) and tested on multiple commonly-used sequences. The test conditions are set as standard Low Delay with IPPP structure setting and Random Access setting as suggested in [4], respectively. Moreover, the QP value is set as 22, 27, 32 and 37, respectively. The first 100 frames of each test sequences are encoded. For each sequence under each QP, the target bit rate is set as the corresponding bit rate resulted from the same encoding setting without enabling the rate control scheme, and the initial QP is set as the current QP. All the simulation experiments are conducted on a computer with 3.6 GHz Intel i7 Core 4 processors, 8 GB memory, and Win 7 operating system.

In this work, we exploit the commonly-used visual quality assessment metric—SSIM [25] to measure the perceptual quality of the reconstructed video, which is obtained by simply averaging the corresponding SSIM value of each frame. The proposed PSRC method is compared with the original rate control method in HEVC [13]. And the average difference between their rate-SSIM curves is measured according to the method in [3] by using the following performance indexes: △SSIM means the average SSIM changes; △BR means the total bit rate changes (in percentage); “+” means increment; and “-” means decrement. In addition, bit rate error (BRE) [13] is used to indicate the mismatch between the target and real bit rate.

$$ BRE=\frac{\left|{R}_{\mathrm{target}}-{R}_{\mathrm{actual}}\right|}{R_{\mathrm{target}}}\times 100\% $$

(16)

where R _target and R _actual are the target bit rate and actual output bit rate, respectively. The lower BRE value indicates a better match between the target and real output bit rate.

Table 1 shows the performance of the proposed PSRC method for the HEVC under low delay and random access setting respectively, compared with the original rate control in HEVC [13]. One can see that the proposed PSRC method can achieve, on average, 6.94 % bit rate reduction and 0.0077 SSIM improvement for low delay setting. And for random access setting, on average, 5.71 % bit rate saving and 0.0058 SSIM increment can be obtained by the proposed PSRC method. In addition, Table 2 shows the bit rate error of the proposed PSRC method and the original rate control in HEVC [13] under low delay and random access setting, respectively. One can see that the proposed PSRC method is able to achieve similar bit rate mismatch between target bits and actual output bits to that of the original rate control in HEVC [13]. Moreover, Figs. 1 and 2 show the examples of the reconstructed frame by the original rate control in HEVC [13] and proposed PSRC method under Low Delay setting and Random Access setting, respectively. One can see that the proposed PSRC method can effectively allocate the bits according to the perceptual sensitivity of the input video so as to significantly reduce the bits while keeping the similar perceptual quality. Similar arguments can be applied to other sequences. Therefore, the proposed PSRC method is able to effectively improve the perceptual coding performance of the HEVC. In addition, another merit of the proposed PSRC method is that it has negligible computational overhead, as one can perceive that the complexity of the perceptual sensitivity measurement is fairly small, compared with the encoding process.

Table 1 The performance of proposed PSRC method under the Low delay setting and Random access setting, compared with the original rate control in HEVC [13]

Full size table

Table 2 The bit rate error of the original rate control in HEVC [13] and proposed PSRC method for HEVC under the low delay and random access setting

Full size table

5 Conclusions

In this paper, an efficient perceptual rate control method, called the perceptual sensitivity-based rate control (PSRC), is proposed for HEVC. The perceptual coding gain is achieved by adaptively allocating the bits for each frame and CTU in the RDO process based on their perceptual sensitivity. More specifically, the CTU with less perceptual sensitivity, which can tolerate more distortion, will be assigned with fewer bits. The above-mentioned perceptual sensitivity is evaluated by using two perceptual features that can effectively reflect the human perception, including spatial texture complexity and temporal motion activity. Experimental results have shown the efficiency of the proposed PSRC method on the improvement of the perceptual coding performance, compared with the original rate control in HEVC.

References

An C, Nguyen TQ (2008) Iterative rate-distortion optimization of H.264 with constant bit rate constraint. IEEE Trans Image Process 17(9):1605–1615
Article MathSciNet Google Scholar
Bhat A, Kannangara S, Zhao Y, Richardson I (2012) A full reference quality metric for compressed video based on mean squared error and video content. IEEE Trans Circ Syst Video Technol 22(2):165–173
Article Google Scholar
Bjontegaard G (2001) Calculation of average PSNR differences between RD-Curves. Document VCEG-M33, VCEG 13th meeting
Bossen F (2012) Common test conditions and software reference configurations. Document JCTVC-J1100
Chen Z, Ngan KN (2007) Recent advances in rate control for video coding. Signal Process Image Commun 22(1):19–38
Article Google Scholar
Chiang T, Zhang Y-Q (1997) A new rate control scheme using quadratic rate distortion model. IEEE Trans Circ Syst Video Technol 7(1):246–250
Article Google Scholar
Choi H, Nam J, Yoo J, Sim D, Bajić IV (2012) Rate control based on unified RQ model for HEVC. Joint Collaborative Team on Video Coding (JCT-VC), ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, 8th Meeting, pp. 1–10
Dong J, Ling N (2009) A context-adaptive prediction scheme for parameter estimation in H.264/AVC macroblock layer rate control. IEEE Trans Circ Syst Video Technol 19(8):1108–1117
Article Google Scholar
Girod B (1993) What’s wrong with mean squared error. In: Digital Images and Human Vision. Cambridge, Ma: MIT Press, pp. 207–220
ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11 (2010) Advanced video coding for generic audiovisual services. JCTVC-A202, Dresden, Germany
Kwon D-K, Shen M-Y, Kuo C-CJ (2007) Rate control for H.264 video with enhanced rate and distortion models. IEEE Trans Circ Syst Video Technol 17(5):517–529
Article Google Scholar
Lee B, Kim M, Nguyen TQ (2014) A frame-level rate control scheme based on texture and non-texture rate models for high efficiency video coding. IEEE Trans Circ Syst Video Technol 24(3):465–479
Article Google Scholar
Li B, Li HQ, Li L, Zhang JL (2014) Lambda domain rate control algorithm for high efficiency video coding. IEEE Trans Image Process 23(9):3841–3854
Article MathSciNet Google Scholar
Li Y, Liao W, Huang J, He D, Chen Z (2014) Saliency based perceptual HEVC. IEEE International Conference on Multimedia & Expo Workshops (ICMEW), pp. 1–5
Li S, Xu M, Deng X, Wang Z (2014) A novel weight-based URQ scheme for perceptual video coding of conversational video in HEVC. IEEE International Conference on Multimedia & Expo, pp. 1–6
Liu Y, Li ZG, Soh YC (2007) A novel rate control scheme for low delay video communication of H.264/AVC standard. IEEE Trans Circ Syst Video Technol 17(1):1152–1162
Article Google Scholar
Oh H, Kim W (2013) Video processing for human perceptual visual quality-oriented video coding. IEEE Trans Image Process 4(4):1526–1535
Article MathSciNet Google Scholar
Ou T, Huang Y, Chen HH (2011) SSIM-based perceptual rate control for video coding. IEEE Trans Circ Syst Video Technol 21(5):682–691
Article Google Scholar
Ou YF, Ma Z, Liu T, Wang Y (2011) Perceptual quality assessment of video considering both frame rate and quantization artifacts. IEEE Trans Circ Syst Video Technol 21(3):286–298
Article Google Scholar
Seshadrinathan K, Bovik AC (2010) Motion tuned spatio-temporal quality assessment of natural videos. IEEE Trans Image Process 19(2):335–350
Article MathSciNet Google Scholar
Sullivan J, Ohm JR, Han WJ, Wiegand T (2012) Overview of the High Efficiency Video Coding (HEVC) standard. IEEE Trans Circ Syst Video Technol 22(12):1649–1668
Article Google Scholar
Tan HL, Li ZG, Tan YH, Rahardja S, Yeo C (2010) A perceptually relevant MSE-based image quality metric. IEEE Trans Image Process 19(2):335–350
Article MathSciNet Google Scholar
Tsai W-J, Chou T-L (2010) Scene change aware intra-frame rate control for H.264/AVC. IEEE Trans Circ Syst Video Technol 20(12):1882–1886
Article Google Scholar
Ugur K, Andersson K, Fuldseth A, Bjøntegaard G et al (2010) High performance, low complexity video coding and the emerging HEVC standard. IEEE Trans Circ Syst Video Technol 20(12):1688–1697
Article Google Scholar
Wang Z, Bovik A, Sheikh H, Simoncelli E (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612
Article Google Scholar
Wang S, Ma S, Wang S, Zhao D, Gao W (2013) Rate-GOP based rate control for high efficiency video coding. IEEE J Select Top Signal Process 7(6):1101–1111
Article Google Scholar
Wang S, Rehman A, Wang Z, Ma S, Gao W (2012) SSIM-motivated rate-distortion optimization for video coding. IEEE Trans Circ Syst Video Technol 22(4):516–529
Article MathSciNet Google Scholar
Wang ZX, Zeng HQ, Chen J, Cai CH (2014) Key techniques of high efficiency video coding standard and its extension. IEEE International Conference on Industrial Electronics and Applications (ICIEA2014), pp. 1169–1173
Wiegand T, Schwarz H, Joch A, Kossentini F, Sullivan GJ (2003) Rate-constrained coder control and comparison of video coding standards. IEEE Trans Circ Syst Video Technol 13(7):688–703
Article Google Scholar

Download references

Acknowledgments

This work was supported in part by a grant from the Research Grants Council of the Hong Kong SAR, China (Project CUHK 415712), in part by the National Natural Science Foundation of China under the Grants 61372107 and 61401167, in part by the Xiamen Key Science and Technology Project Foundation under the Grant 3502Z20133024, in part by the Opening Project of State Key Laboratory of Digital Publishing Technology under the grant FZDP2015-B-001, and in part by the High-Level Talent Project Foundation of Huaqiao University under the Grants 14BS201 and 14BS204.

Author information

Authors and Affiliations

School of Information Science and Engineering, Huaqiao University, Xiamen, 361021, China
Huanqiang Zeng & Aisheng Yang
Department of Electronic Engineering, The Chinese University of Hong Kong, Hong Kong, China
King Ngi Ngan & Miaohui Wang

Authors

Huanqiang Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Aisheng Yang
View author publications
You can also search for this author in PubMed Google Scholar
King Ngi Ngan
View author publications
You can also search for this author in PubMed Google Scholar
Miaohui Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huanqiang Zeng.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zeng, H., Yang, A., Ngan, K.N. et al. Perceptual sensitivity-based rate control method for high efficiency video coding. Multimed Tools Appl 75, 10383–10396 (2016). https://doi.org/10.1007/s11042-015-2997-3

Download citation

Received: 28 June 2015
Revised: 20 September 2015
Accepted: 08 October 2015
Published: 20 October 2015
Issue Date: September 2016
DOI: https://doi.org/10.1007/s11042-015-2997-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Perceptual sensitivity-based rate control method for high efficiency video coding

Abstract

Similar content being viewed by others

Perceptual importance analysis-based rate control method for HEVC

Perceptual feature guided rate distortion optimization for high efficiency video coding

Improving compression efficiency of HEVC using perceptual coding

1 Introduction

2 Related works