Synthesized Views Distortion Model Based Rate Control in 3D-HEVC

Tan, Songchao; Ma, Siwei; Wang, Shanshe; Gao, Wen

doi:10.1007/978-3-319-24078-7_3

Songchao Tan¹⁸,
Siwei Ma¹⁹,
Shanshe Wang¹⁹ &
…
Wen Gao¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9315))

Included in the following conference series:

Pacific Rim Conference on Multimedia

1881 Accesses
1 Citations

Abstract

In this paper, we propose a synthesized views distortion model based rate control algorithm for the high efficiency video coding (HEVC) based 3D video compression standard. The major contributions of the paper include the following two aspects. Firstly, we investigate the distortion dependency between the synthesized views and the coded views including texture video and depth maps. Then we propose a synthesized views distortion model for 3D-HEVC, and based on the distortion model an efficient joint bit allocation scheme is proposed. Experimental results show that the proposed rate control algorithm achieves better performance on both the coded texture views and synthesized views. The maximum overall (including all coded texture views and all synthesized views) performance improvement can be up to 14.4 % and the average BD-rate gain is 6.9 %. Moreover, it can accurately control the bitrate to satisfy the total bitrate constraint.

Access provided by Autonomous University of Puebla. Download conference paper PDF

An Improvement for View Synthesis Optimization Algorithm

Adaptive Bit Allocation for 3D Video Coding

Article 02 September 2016

Perceptual rate distortion optimization of 3D–HEVC using PSNR-HVS

Article 23 December 2017

Keywords

1 Introduction

The 3D extension of High Efficiency Video Coding [1] (3D-HEVC) was developed by the Joint Collaborative Team on 3D Video Coding Extension Development (JCT-3 V) led by ISO/IEC MPEG and ITU-T VCEG. 3DV video coding aims at coding the visual information of a 3D scene that usually contains multi-view texture data and its corresponding depth information [2]. In 3D-HEVC, one view is selected as a base view which is coded independently of the other views to provide backward compatibility to HEVC decoders. Other views (termed as dependent views) are coded with inter-view prediction using the base view or other reference views to reduce the redundancy between views.

As an important module of video encoder, rate control (RC) is employed to regulate the bit rate meanwhile to guarantee a good video quality. For each video coding standard, rate control is always a hot research topic and many different rate control schemes for different video coding standards have been proposed, such as quadratic model for H.264 [3] and URQ model [4], R-lambda [5] and rate-GOP [6] for HEVC. However for 3D-HEVC, rate control becomes more complicated. Different from the traditional 2D video coding standards, 3D-HEVC utilizes the depth map to generate

the synthesized virtual views. The coding quality of synthesized views depends on the quality of texture video and depth maps [7]. As such, it is important to balance the bit allocation for texture video and depth maps to get better quality of synthesized views. And many other techniques are adopted to improve the coding performance. All these techniques bring great challenges to establish an accurate rate distortion (R-D) model and bit allocation scheme in rate control for 3D-HEVC.

In the literature, several rate control schemes for 3D-HEVC are proposed. Two representative rate control methods, URQ and R-lambda have been proposed for 3D-HEVC in [8,9], which are the extension of the methods in HEVC. In order to get better performance, depth maps-based inter-views MAD prediction is proposed to improve the prediction accuracy of the to-be-generated bits for the current unit. However, there is still large room for R-D performance improvement. In our previous work [10], we proposed an adaptive rate control scheme for 3D-HEVC. The algorithm performance including bit rate mismatch and R-D performance is significantly improved compared to the above two algorithms.

In this paper, we further propose a novel rate control scheme based on the synthesized views distortion model in 3D-HEVC. Firstly, we investigate the distortion dependency between the synthesized views and the input texture video and depth maps, and formulate a distortion model for synthesized views. Secondly, based on this model, the bit allocation scheme for texture video and depth maps is formulated as an optimization problem.

The rest of this paper is organized as follows. In Sect. 2, the R-D characteristics of both the coded views and synthesized views are investigated. In Sect. 2.1, the R-D model for coded texture views is proposed. A view synthesis distortion model to characterize the distortion dependency of the texture video and the depth maps on the synthesized virtual views is investigated in Sect. 2.2. In Sect. 2.3, an effective joint bit allocation based rate control scheme is designed for 3D-HEVC. In Sect. 3, the experimental results are given to demonstrate the efficiency of the proposed RC algorithm. Finally, Sect. 4 concludes this paper.

2 Rate and Distortion Analysis in 3D-HEVC

As illustrated in Fig. 1, the texture videos are captured by synchronizing the multiple camera arrays. The associated depth maps are also generated for virtual view synthesis. At the encoder, texture video and depth maps are encoded using 3D-HEVC. At the client side, the arbitrary virtual views are synthesized from the decoded texture video and depth maps. Then the decoded texture video and synthesized views would be presented for viewing at receiver side. Therefore, the quality of coded texture views and the virtual synthesized views needs to be optimized as follows

$$ \begin{aligned} & \hbox{min} \left( {D_{v} + D_{c} } \right), \\ & s.t. R_{d} + R_{t} \le R_{c} , \\ \end{aligned} $$

(1)

where R _t and R _d are the bit rate of texture video and depth maps, respectively. D _c and D _v are the distortion of texture video and synthesized views respectively.

In order to model the expression in (1), we need to investigate the rate and distortion (R-D) relationship for coded texture views and synthesized views, respectively.

2.1 R-D Model for the Coded Texture Views

To obtain the R-D characteristics of the coded views, we encode the original texture video with 4 quantization parameters (QP) (25, 30, 35, and 40). As an example, the R-D curve of the texture distortion and the bit rate of test sequence ‘Newspaper_CC’ are illustrated in Fig. 2.

It can be observed that power functions can be used to fit the R-D points of texture video well.

$$ D_{t} \left( {R_{t} } \right) \cong \alpha_{c} R_{t}^{{ - \beta_{c} }} , $$

(2)

where $ D_{t} $ is the distortion of the coded texture views. $ R_{t} $ is the bit rate of texture views. $ \alpha_{c} $ and $ \beta_{c} $ are model parameters.

2.2 R-D Model for Synthesized Views

To find the best bit budget between the texture and depth, we also need to establish the relationship of bit rate and the synthesized views distortion. We investigate the synthesized views quality influence on the bit rate of texture video (R _t) and the bit rate of depth map (R _d).

In Fig. 3, the quality influence of texture and depth on synthesized views is investigated by changing the texture quantization parameter Q ^T from 5 to 45 meanwhile fixing the depth quantization parameter Q ^D at 24, 34, 39, 44 and 49 respectively. The quality of synthesized views (D _s) is measured in term of MSE. As shown in Fig. 3, once R _t/R _d is determined, the D _s - R _t/D _s - R _d relationship can be approximated as power expression.

The D _s - R _t relationship as

$$ D_{s} \left( {R_{t} } \right) = \alpha_{t} R_{t}^{{ - \beta_{t} }} . $$

(3)

And the D _s - R _d relationship as

$$ D_{s} \left( {R_{d} } \right) = \alpha_{d} R_{d}^{{ - \beta_{d} }} . $$

(4)

Therefore from (3) to (4), we get the distortion model for the synthesized views as follows,

$$ D_{s} = \alpha_{t} R_{t}^{{ - \beta_{t} }} + \alpha_{d} R_{d}^{{ - \beta_{d} }} , $$

(5)

where $ D_{s} $ is the distortion of synthesized views. $ R_{t} $ and $ R_{d} $ are the bits for texture and depth. $ \alpha_{t} $, $ \beta_{t} $, $ \alpha_{d} $ and $ \beta_{d} $ are the model parameters.

2.3 A Joint Bit Allocation Based RC Scheme for 3D-HEVC

Rate control for 3D-HEVC needs to solve the bit allocation on texture/depth level, view level and frame level. The optimum bit allocation problem is to effectively distribute the bit budget between texture and depth so that the minimum views synthesis and coded views distortion are achieved. Based on the proposed coded texture views and synthesized views R-D model (5), we formulate the overall quality based optimum bit allocation as

$$ \begin{aligned} & \left( {R_{t}^{opt} ,R_{d}^{opt} } \right) = \arg \hbox{min} \left( {\alpha_{t} R_{t}^{{ - \beta_{t} }} + \alpha_{c} R_{t}^{{ - \beta_{c} }} + \alpha_{d} R_{d}^{{ - \beta_{d} }} } \right), \\ & s.t. R_{t} + R_{d} \le R_{c} . \\ \end{aligned} $$

(6)

ζ is used to represent the proportional relationship between R _t and R _d, defined as

$$ \upzeta = \frac{{R_{t}^{opt} }}{{R_{d}^{opt} }}. $$

(7)

Therefore, from (6) and (7), we get the objective optimization function with only one variable ζ, as shown below,

$$ \left(\upzeta \right) = \arg \hbox{min} \left( {\alpha_{t} \left( {\frac{\upzeta}{{1 +\upzeta}}R_{c} } \right)^{{ - \beta_{t} }} + \alpha_{c} \left( {\frac{\upzeta}{{1 +\upzeta}}R_{c} } \right)^{{ - \beta_{c} }} + \alpha_{d} \left( {\frac{1}{{1 +\upzeta}}R_{c} } \right)^{{ - \beta_{d} }} } \right). $$

(8)

Many optimization methods can be used to find the optimal solution of (8). In this paper, Newton iterative method is used to get the approximate optimal value. The target bit rate for the texture and depth can be expressed as follows

$$ R_{t} = R_{c} \cdot \frac{\upzeta}{{1 +\upzeta}} , $$

(9)

$$ R_{d} = R_{c} \cdot \frac{1}{{1 +\upzeta}} . $$

(10)

In order to estimate these parameters, we first encode the frames in the first GOP. Then the model parameters are calculated by the least square error method.

Based on the optimal target bit rate for the texture and depth, the bit rate ratio between the different views can be further determined by the statistical analysis. In this paper, we use anchor’s bits ratio between the base view and the dependent views to allocate the bits for different views.

After allocating the target bit rate for texture/depth level and view level, the target bit rate needs to be allocated for the different frames. The frame level bit allocation is proposed in our previous work [7] as follows

$$ R_{n,i} = \left\{ {\begin{array}{*{20}l} {R_{n}^{remain} \cdot {\kern 1pt} \phi } \hfill & {I\,frame} \hfill \\ {R_{n}^{remain} \cdot w_{i} \cdot \left( {1 - \phi } \right)} \hfill & {others} \hfill \\ \end{array} } \right., $$

(11)

$$ R_{n}^{remain} = \frac{{bit{\kern 1pt} rate}}{framerate}{\kern 1pt} \cdot N_{n} + {{\left( {R_{n - 1}^{remain} - R_{n - 1}^{actual} } \right)} \mathord{\left/ {\vphantom {{\left( {R_{n - 1}^{remain} - R_{n - 1}^{actual} } \right)} {N_{rest}^{G} }}} \right. \kern-0pt} {N_{rest}^{G} }}, $$

(12)

where $ R_{n,i} $ is the target bits for i ^th frame in n ^th GOP. $ R_{n}^{remain} $ and $ R_{n}^{actual} $ are the target and actual bits in n ^th GOP. N _n is the numbers of n th GOP’s frames. $ N_{rest}^{G} $ is the number of the rest GOP which is not coded. $ \phi $ is a proportion of the I frame in a GOP which is recommended to be 0.4 and 0.25 respectively for the first and the rest GOPs based on experiments. w _i is the weight of the frames in RA hierarchical structure getting from experience.

$$ w_{i} = \left\{ {\begin{array}{*{20}l} {0.07} \hfill & {if(POC\% 8 = = 0)} \hfill \\ {0.056} \hfill & {if(POC\% 8 = = 4)} \hfill \\ {0.0454} \hfill & {if(POC\% 4 = = 2)} \hfill \\ {0.035{\kern 1pt} } \hfill & {else} \hfill \\ \end{array} ,} \right. $$

(13)

where POC denotes Picture Order Count and represents an output order of the pictures in the video stream.

When overflow or underflow occurs, the difference between the target bits and the actual bits in a GOP will be distributed to the rest GOPs averagely.

Trade-off between the output bit rate (R) and the quality (D) of the compressed video are determined by the quantization step size (Qs), which is indexed by quantization parameter (Q). The R-Qs and D-Qs model have been studied extensively for the previous video coding standards such as H.264/AVC and HEVC. Here we use a linear model which is proposed in our previous work [7] as follows

$$ R = \alpha \times X/QP, $$

(14)

where α is the model parameter. R is the coding rate. QP is the quantization parameter. X is the complexity estimation for the current picture which is computed as following.

$$ X = \left( {\sum\limits_{i = 0}^{n} {(w_{i} \times SAD_{i} )} /\sum\limits_{i = 0}^{n - 1} {(w_{i} \times SAD_{i} )} } \right)^{1 - \lambda } \times R_{n - 1} \times QP_{n - 1} , $$

(15)

where n is the current frame number. QP _n-1 is the quantization parameter of the (n-1) th frame. R _n-1 is the actual bits of the (n-1) th frame. w _i is defined as:

$$ w_{i} = 0.5^{n - i} /\mathop \sum \limits_{i = 0}^{n} 0.5^{n - i} $$

(16)

3 Experimental Results

To evaluate the proposed 3D-HEVC rate control algorithm, the proposed algorithm is integrated into the reference software HTM10.0. In order to evaluate the performance of the proposed RC algorithm and R-lambda algorithm is utilized for comparison. We have tested our algorithm on all of eight sequences defined in the CTCs (1024 × 768 and 1920 × 1088). Each sequence is composed of three views: the left, the center (coded first) and the right view. After coding, six synthesized views were rendered.

3.1 Control Accuracy

To evaluate the accuracy of the bit rate control, the following measurement is adopted.

$$ Error = \frac{{\left| {R_{actual} - R_{target} } \right|}}{{R_{target} }} \times 100\% , $$

(17)

where Error is the bits error. R _target and R _actual are the number of target bits and the actual output bits, respectively.

As illustrated in Table 1, it can be seen that the proposed RC algorithm achieves smaller mismatch between target bits and actual output bits. That is because the frame level bit allocation proposed in our previous work [10] is designed more suitable for I-SLICE instead of relying on overflow/underflow handling strategy.

Table 1. Proposed Algorithm R-D Performance Compared With Anchor And R-lambda

Full size table

3.2 R-D Performance

In order to objectively evaluate the performance of the proposed RC algorithm, R-lambda algorithm proposed in [9] is utilized for comparison. In [9], the target bit rate of each texture video is set as corresponding bit rate in HTM anchor and depth maps are coded with fixed QP as the same as anchor. In the proposed algorithm, the target bit rate for all coded views’ bit rate (including three texture videos and three depth maps) is set as anchor’s total bit rate.

As illustrated in Table 1 and Fig. 4, we can see that the proposed algorithm shows much better R-D performance than R-lambda for both coded texture views and synthesized views. Based on the proposed synthesized views distortion model, the optimal bit allocation for the texture and depth is achieved. The maximum performance improvement for all views (including coded texture views and synthesized views) can be up to 14.4 % and the average BD-rate gain is 6.9 %.

Furthermore, two R-D curves are shown in Fig. 4. It can be observed the proposed RC algorithm shows much better R-D performance than R-lambda model for both high bit rate and low bit rate.

4 Conclusions

This paper has presented a synthesized views distortion model based joint bit allocation and rate control method to achieve the best overall quality for 3D-HEVC. The distortion dependency is investigated between the coded views and the synthesized views. The proposed bit allocation method is classified into three levels, namely texture/depth level, view level and frame level. Experimental are conducted on different video sequences and the results show that the proposed method can achieve much better R-D performance than other algorithms for 3D-HEVC.

References

Bross, B., Han, W.-J., Sullivan, G.J., Ohm, J.-R., Wiegand, T.: High efficiency video coding (HEVC) text specification draft 8. JCTVC-J1003, Stockholm, July 2012
Google Scholar
Kauff, P., Atzpadin, N., Fehn, C., Müller, M., Schreer, O., Smolic, A., Tanger, R.: Depth Map Creation and Image Based Rendering for Advanced 3DTV Services Providing Interoperability and Scalability. Signal Processing: Image Communication, Special Issue on 3DTV, pp. 217–234, February
Google Scholar
Ma, S., Gao, W., Lu, Y.: Rate-distortion analysis for H.264/AVC video coding and its application to rate control. IEEE Trans. Circ. Syst. Video Technol. 15(12), 1533–1544 (2005)
Article Google Scholar
Choi, H., Nam, J., Yoo, J., Sim, D., Bajić, I.V.: Rate control based on unified RQ model for HEVC. JCT-VC of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, JCT-VC H0213 (m23088), San José, CA, USA, February 2012
Google Scholar
Li, B., Li, H., Li, L., Zhang, J.: lambda Domain Rate Control Algorithm for High Efficiency Video Coding. IEEE Trans. Image Process. 23(9), 3841–3854 (2014)
Article MathSciNet Google Scholar
Wang, S., Ma, S., Wang, S., Zhao, D., Gao, W.: Rate-GOP based rate control for high efficiency video coding. IEEE J. Sel. Top. Sign. Process. 7(6), 1101–1111 (2013)
Article Google Scholar
Ma, S., Wang, S., Gao, W.: Low complexity adaptive view synthesis optimization in HEVC based 3D video coding. IEEE Trans. Multimedia 16(1), 266–271 (2014)
Article Google Scholar
Lim, W., Sim, D., Bajić, I.V.: JCT3 V – Improvement of the rate control for 3D multi-view video coding. ISO/IEC JTC1/SC29/WG11, JCT3 V-C0090, Geneva, Switzerland, January 2013
Google Scholar
Lim, W., Sim, D., Bajić, I.V.: JCT3 V –The rate control schemes for 3D multi-view video coding. ISO/IEC JTC1/SC29/WG11, JCT3 V-D0111, Incheon, KR, April 2013
Google Scholar
Tan, S., Si, J., Ma, S., Wang, S., Gao, W.: Adaptive Frame Level Rate Control in 3D-HEVC. Visual Communication and Image Processing, Malta, December 2014
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, China
Songchao Tan
Institute of Digital Media, Peking University, Beijing, 100871, China
Siwei Ma, Shanshe Wang & Wen Gao

Authors

Songchao Tan
View author publications
You can also search for this author in PubMed Google Scholar
Siwei Ma
View author publications
You can also search for this author in PubMed Google Scholar
Shanshe Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wen Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Songchao Tan .

Editor information

Editors and Affiliations

Gwangju Institute of Science and Technology, Gwangju, Korea (Republic of)
Yo-Sung Ho
Chinese Academy of Sciences, Institute of Automation, Beijing, China
Jitao Sang
KAIST, Daejeon, Korea (Republic of)
Yong Man Ro
KAIST, Daejeon, Korea (Republic of)
Junmo Kim
College of Computer Science, Zhejiang University, Hangzhou, China
Fei Wu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tan, S., Ma, S., Wang, S., Gao, W. (2015). Synthesized Views Distortion Model Based Rate Control in 3D-HEVC. In: Ho, YS., Sang, J., Ro, Y., Kim, J., Wu, F. (eds) Advances in Multimedia Information Processing -- PCM 2015. PCM 2015. Lecture Notes in Computer Science(), vol 9315. Springer, Cham. https://doi.org/10.1007/978-3-319-24078-7_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-24078-7_3
Published: 15 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24077-0
Online ISBN: 978-3-319-24078-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics