Abstract
In this paper, we propose a synthesized views distortion model based rate control algorithm for the high efficiency video coding (HEVC) based 3D video compression standard. The major contributions of the paper include the following two aspects. Firstly, we investigate the distortion dependency between the synthesized views and the coded views including texture video and depth maps. Then we propose a synthesized views distortion model for 3D-HEVC, and based on the distortion model an efficient joint bit allocation scheme is proposed. Experimental results show that the proposed rate control algorithm achieves better performance on both the coded texture views and synthesized views. The maximum overall (including all coded texture views and all synthesized views) performance improvement can be up to 14.4 % and the average BD-rate gain is 6.9 %. Moreover, it can accurately control the bitrate to satisfy the total bitrate constraint.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
The 3D extension of High Efficiency Video Coding [1] (3D-HEVC) was developed by the Joint Collaborative Team on 3D Video Coding Extension Development (JCT-3 V) led by ISO/IEC MPEG and ITU-T VCEG. 3DV video coding aims at coding the visual information of a 3D scene that usually contains multi-view texture data and its corresponding depth information [2]. In 3D-HEVC, one view is selected as a base view which is coded independently of the other views to provide backward compatibility to HEVC decoders. Other views (termed as dependent views) are coded with inter-view prediction using the base view or other reference views to reduce the redundancy between views.
As an important module of video encoder, rate control (RC) is employed to regulate the bit rate meanwhile to guarantee a good video quality. For each video coding standard, rate control is always a hot research topic and many different rate control schemes for different video coding standards have been proposed, such as quadratic model for H.264 [3] and URQ model [4], R-lambda [5] and rate-GOP [6] for HEVC. However for 3D-HEVC, rate control becomes more complicated. Different from the traditional 2D video coding standards, 3D-HEVC utilizes the depth map to generate
the synthesized virtual views. The coding quality of synthesized views depends on the quality of texture video and depth maps [7]. As such, it is important to balance the bit allocation for texture video and depth maps to get better quality of synthesized views. And many other techniques are adopted to improve the coding performance. All these techniques bring great challenges to establish an accurate rate distortion (R-D) model and bit allocation scheme in rate control for 3D-HEVC.
In the literature, several rate control schemes for 3D-HEVC are proposed. Two representative rate control methods, URQ and R-lambda have been proposed for 3D-HEVC in [8,9], which are the extension of the methods in HEVC. In order to get better performance, depth maps-based inter-views MAD prediction is proposed to improve the prediction accuracy of the to-be-generated bits for the current unit. However, there is still large room for R-D performance improvement. In our previous work [10], we proposed an adaptive rate control scheme for 3D-HEVC. The algorithm performance including bit rate mismatch and R-D performance is significantly improved compared to the above two algorithms.
In this paper, we further propose a novel rate control scheme based on the synthesized views distortion model in 3D-HEVC. Firstly, we investigate the distortion dependency between the synthesized views and the input texture video and depth maps, and formulate a distortion model for synthesized views. Secondly, based on this model, the bit allocation scheme for texture video and depth maps is formulated as an optimization problem.
The rest of this paper is organized as follows. In Sect. 2, the R-D characteristics of both the coded views and synthesized views are investigated. In Sect. 2.1, the R-D model for coded texture views is proposed. A view synthesis distortion model to characterize the distortion dependency of the texture video and the depth maps on the synthesized virtual views is investigated in Sect. 2.2. In Sect. 2.3, an effective joint bit allocation based rate control scheme is designed for 3D-HEVC. In Sect. 3, the experimental results are given to demonstrate the efficiency of the proposed RC algorithm. Finally, Sect. 4 concludes this paper.
2 Rate and Distortion Analysis in 3D-HEVC
As illustrated in Fig. 1, the texture videos are captured by synchronizing the multiple camera arrays. The associated depth maps are also generated for virtual view synthesis. At the encoder, texture video and depth maps are encoded using 3D-HEVC. At the client side, the arbitrary virtual views are synthesized from the decoded texture video and depth maps. Then the decoded texture video and synthesized views would be presented for viewing at receiver side. Therefore, the quality of coded texture views and the virtual synthesized views needs to be optimized as follows
where R t and R d are the bit rate of texture video and depth maps, respectively. D c and D v are the distortion of texture video and synthesized views respectively.
In order to model the expression in (1), we need to investigate the rate and distortion (R-D) relationship for coded texture views and synthesized views, respectively.
2.1 R-D Model for the Coded Texture Views
To obtain the R-D characteristics of the coded views, we encode the original texture video with 4 quantization parameters (QP) (25, 30, 35, and 40). As an example, the R-D curve of the texture distortion and the bit rate of test sequence ‘Newspaper_CC’ are illustrated in Fig. 2.
It can be observed that power functions can be used to fit the R-D points of texture video well.
where \( D_{t} \) is the distortion of the coded texture views. \( R_{t} \) is the bit rate of texture views. \( \alpha_{c} \) and \( \beta_{c} \) are model parameters.
2.2 R-D Model for Synthesized Views
To find the best bit budget between the texture and depth, we also need to establish the relationship of bit rate and the synthesized views distortion. We investigate the synthesized views quality influence on the bit rate of texture video (R t ) and the bit rate of depth map (R d ).
In Fig. 3, the quality influence of texture and depth on synthesized views is investigated by changing the texture quantization parameter Q T from 5 to 45 meanwhile fixing the depth quantization parameter Q D at 24, 34, 39, 44 and 49 respectively. The quality of synthesized views (D s ) is measured in term of MSE. As shown in Fig. 3, once R t /R d is determined, the D s - R t /D s - R d relationship can be approximated as power expression.
The D s - R t relationship as
And the D s - R d relationship as
Therefore from (3) to (4), we get the distortion model for the synthesized views as follows,
where \( D_{s} \) is the distortion of synthesized views. \( R_{t} \) and \( R_{d} \) are the bits for texture and depth. \( \alpha_{t} \), \( \beta_{t} \), \( \alpha_{d} \) and \( \beta_{d} \) are the model parameters.
2.3 A Joint Bit Allocation Based RC Scheme for 3D-HEVC
Rate control for 3D-HEVC needs to solve the bit allocation on texture/depth level, view level and frame level. The optimum bit allocation problem is to effectively distribute the bit budget between texture and depth so that the minimum views synthesis and coded views distortion are achieved. Based on the proposed coded texture views and synthesized views R-D model (5), we formulate the overall quality based optimum bit allocation as
ζ is used to represent the proportional relationship between R t and R d , defined as
Therefore, from (6) and (7), we get the objective optimization function with only one variable ζ, as shown below,
Many optimization methods can be used to find the optimal solution of (8). In this paper, Newton iterative method is used to get the approximate optimal value. The target bit rate for the texture and depth can be expressed as follows
In order to estimate these parameters, we first encode the frames in the first GOP. Then the model parameters are calculated by the least square error method.
Based on the optimal target bit rate for the texture and depth, the bit rate ratio between the different views can be further determined by the statistical analysis. In this paper, we use anchor’s bits ratio between the base view and the dependent views to allocate the bits for different views.
After allocating the target bit rate for texture/depth level and view level, the target bit rate needs to be allocated for the different frames. The frame level bit allocation is proposed in our previous work [7] as follows
where \( R_{n,i} \) is the target bits for i th frame in n th GOP. \( R_{n}^{remain} \) and \( R_{n}^{actual} \) are the target and actual bits in n th GOP. N n is the numbers of n th GOP’s frames. \( N_{rest}^{G} \) is the number of the rest GOP which is not coded. \( \phi \) is a proportion of the I frame in a GOP which is recommended to be 0.4 and 0.25 respectively for the first and the rest GOPs based on experiments. w i is the weight of the frames in RA hierarchical structure getting from experience.
where POC denotes Picture Order Count and represents an output order of the pictures in the video stream.
When overflow or underflow occurs, the difference between the target bits and the actual bits in a GOP will be distributed to the rest GOPs averagely.
Trade-off between the output bit rate (R) and the quality (D) of the compressed video are determined by the quantization step size (Qs), which is indexed by quantization parameter (Q). The R-Qs and D-Qs model have been studied extensively for the previous video coding standards such as H.264/AVC and HEVC. Here we use a linear model which is proposed in our previous work [7] as follows
where α is the model parameter. R is the coding rate. QP is the quantization parameter. X is the complexity estimation for the current picture which is computed as following.
where n is the current frame number. QP n-1 is the quantization parameter of the (n-1) th frame. R n-1 is the actual bits of the (n-1) th frame. w i is defined as:
3 Experimental Results
To evaluate the proposed 3D-HEVC rate control algorithm, the proposed algorithm is integrated into the reference software HTM10.0. In order to evaluate the performance of the proposed RC algorithm and R-lambda algorithm is utilized for comparison. We have tested our algorithm on all of eight sequences defined in the CTCs (1024 × 768 and 1920 × 1088). Each sequence is composed of three views: the left, the center (coded first) and the right view. After coding, six synthesized views were rendered.
3.1 Control Accuracy
To evaluate the accuracy of the bit rate control, the following measurement is adopted.
where Error is the bits error. R target and R actual are the number of target bits and the actual output bits, respectively.
As illustrated in Table 1, it can be seen that the proposed RC algorithm achieves smaller mismatch between target bits and actual output bits. That is because the frame level bit allocation proposed in our previous work [10] is designed more suitable for I-SLICE instead of relying on overflow/underflow handling strategy.
3.2 R-D Performance
In order to objectively evaluate the performance of the proposed RC algorithm, R-lambda algorithm proposed in [9] is utilized for comparison. In [9], the target bit rate of each texture video is set as corresponding bit rate in HTM anchor and depth maps are coded with fixed QP as the same as anchor. In the proposed algorithm, the target bit rate for all coded views’ bit rate (including three texture videos and three depth maps) is set as anchor’s total bit rate.
As illustrated in Table 1 and Fig. 4, we can see that the proposed algorithm shows much better R-D performance than R-lambda for both coded texture views and synthesized views. Based on the proposed synthesized views distortion model, the optimal bit allocation for the texture and depth is achieved. The maximum performance improvement for all views (including coded texture views and synthesized views) can be up to 14.4 % and the average BD-rate gain is 6.9 %.
Furthermore, two R-D curves are shown in Fig. 4. It can be observed the proposed RC algorithm shows much better R-D performance than R-lambda model for both high bit rate and low bit rate.
4 Conclusions
This paper has presented a synthesized views distortion model based joint bit allocation and rate control method to achieve the best overall quality for 3D-HEVC. The distortion dependency is investigated between the coded views and the synthesized views. The proposed bit allocation method is classified into three levels, namely texture/depth level, view level and frame level. Experimental are conducted on different video sequences and the results show that the proposed method can achieve much better R-D performance than other algorithms for 3D-HEVC.
References
Bross, B., Han, W.-J., Sullivan, G.J., Ohm, J.-R., Wiegand, T.: High efficiency video coding (HEVC) text specification draft 8. JCTVC-J1003, Stockholm, July 2012
Kauff, P., Atzpadin, N., Fehn, C., Müller, M., Schreer, O., Smolic, A., Tanger, R.: Depth Map Creation and Image Based Rendering for Advanced 3DTV Services Providing Interoperability and Scalability. Signal Processing: Image Communication, Special Issue on 3DTV, pp. 217–234, February
Ma, S., Gao, W., Lu, Y.: Rate-distortion analysis for H.264/AVC video coding and its application to rate control. IEEE Trans. Circ. Syst. Video Technol. 15(12), 1533–1544 (2005)
Choi, H., Nam, J., Yoo, J., Sim, D., Bajić, I.V.: Rate control based on unified RQ model for HEVC. JCT-VC of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, JCT-VC H0213 (m23088), San José, CA, USA, February 2012
Li, B., Li, H., Li, L., Zhang, J.: lambda Domain Rate Control Algorithm for High Efficiency Video Coding. IEEE Trans. Image Process. 23(9), 3841–3854 (2014)
Wang, S., Ma, S., Wang, S., Zhao, D., Gao, W.: Rate-GOP based rate control for high efficiency video coding. IEEE J. Sel. Top. Sign. Process. 7(6), 1101–1111 (2013)
Ma, S., Wang, S., Gao, W.: Low complexity adaptive view synthesis optimization in HEVC based 3D video coding. IEEE Trans. Multimedia 16(1), 266–271 (2014)
Lim, W., Sim, D., Bajić, I.V.: JCT3 V – Improvement of the rate control for 3D multi-view video coding. ISO/IEC JTC1/SC29/WG11, JCT3 V-C0090, Geneva, Switzerland, January 2013
Lim, W., Sim, D., Bajić, I.V.: JCT3 V –The rate control schemes for 3D multi-view video coding. ISO/IEC JTC1/SC29/WG11, JCT3 V-D0111, Incheon, KR, April 2013
Tan, S., Si, J., Ma, S., Wang, S., Gao, W.: Adaptive Frame Level Rate Control in 3D-HEVC. Visual Communication and Image Processing, Malta, December 2014
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Tan, S., Ma, S., Wang, S., Gao, W. (2015). Synthesized Views Distortion Model Based Rate Control in 3D-HEVC. In: Ho, YS., Sang, J., Ro, Y., Kim, J., Wu, F. (eds) Advances in Multimedia Information Processing -- PCM 2015. PCM 2015. Lecture Notes in Computer Science(), vol 9315. Springer, Cham. https://doi.org/10.1007/978-3-319-24078-7_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-24078-7_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24077-0
Online ISBN: 978-3-319-24078-7
eBook Packages: Computer ScienceComputer Science (R0)