Keywords

1 Introduction

The 3D extension of High Efficiency Video Coding [1] (3D-HEVC) was developed by the Joint Collaborative Team on 3D Video Coding Extension Development (JCT-3 V) led by ISO/IEC MPEG and ITU-T VCEG. 3DV video coding aims at coding the visual information of a 3D scene that usually contains multi-view texture data and its corresponding depth information [2]. In 3D-HEVC, one view is selected as a base view which is coded independently of the other views to provide backward compatibility to HEVC decoders. Other views (termed as dependent views) are coded with inter-view prediction using the base view or other reference views to reduce the redundancy between views.

As an important module of video encoder, rate control (RC) is employed to regulate the bit rate meanwhile to guarantee a good video quality. For each video coding standard, rate control is always a hot research topic and many different rate control schemes for different video coding standards have been proposed, such as quadratic model for H.264 [3] and URQ model [4], R-lambda [5] and rate-GOP [6] for HEVC. However for 3D-HEVC, rate control becomes more complicated. Different from the traditional 2D video coding standards, 3D-HEVC utilizes the depth map to generate

the synthesized virtual views. The coding quality of synthesized views depends on the quality of texture video and depth maps [7]. As such, it is important to balance the bit allocation for texture video and depth maps to get better quality of synthesized views. And many other techniques are adopted to improve the coding performance. All these techniques bring great challenges to establish an accurate rate distortion (R-D) model and bit allocation scheme in rate control for 3D-HEVC.

In the literature, several rate control schemes for 3D-HEVC are proposed. Two representative rate control methods, URQ and R-lambda have been proposed for 3D-HEVC in [8,9], which are the extension of the methods in HEVC. In order to get better performance, depth maps-based inter-views MAD prediction is proposed to improve the prediction accuracy of the to-be-generated bits for the current unit. However, there is still large room for R-D performance improvement. In our previous work [10], we proposed an adaptive rate control scheme for 3D-HEVC. The algorithm performance including bit rate mismatch and R-D performance is significantly improved compared to the above two algorithms.

In this paper, we further propose a novel rate control scheme based on the synthesized views distortion model in 3D-HEVC. Firstly, we investigate the distortion dependency between the synthesized views and the input texture video and depth maps, and formulate a distortion model for synthesized views. Secondly, based on this model, the bit allocation scheme for texture video and depth maps is formulated as an optimization problem.

The rest of this paper is organized as follows. In Sect. 2, the R-D characteristics of both the coded views and synthesized views are investigated. In Sect. 2.1, the R-D model for coded texture views is proposed. A view synthesis distortion model to characterize the distortion dependency of the texture video and the depth maps on the synthesized virtual views is investigated in Sect. 2.2. In Sect. 2.3, an effective joint bit allocation based rate control scheme is designed for 3D-HEVC. In Sect. 3, the experimental results are given to demonstrate the efficiency of the proposed RC algorithm. Finally, Sect. 4 concludes this paper.

2 Rate and Distortion Analysis in 3D-HEVC

As illustrated in Fig. 1, the texture videos are captured by synchronizing the multiple camera arrays. The associated depth maps are also generated for virtual view synthesis. At the encoder, texture video and depth maps are encoded using 3D-HEVC. At the client side, the arbitrary virtual views are synthesized from the decoded texture video and depth maps. Then the decoded texture video and synthesized views would be presented for viewing at receiver side. Therefore, the quality of coded texture views and the virtual synthesized views needs to be optimized as follows

$$ \begin{aligned} & \hbox{min} \left( {D_{v} + D_{c} } \right), \\ & s.t. R_{d} + R_{t} \le R_{c} , \\ \end{aligned} $$
(1)

where R t and R d are the bit rate of texture video and depth maps, respectively. D c and D v are the distortion of texture video and synthesized views respectively.

Fig. 1.
figure 1

General framework of the 3D-HEVC [1] system

In order to model the expression in (1), we need to investigate the rate and distortion (R-D) relationship for coded texture views and synthesized views, respectively.

2.1 R-D Model for the Coded Texture Views

To obtain the R-D characteristics of the coded views, we encode the original texture video with 4 quantization parameters (QP) (25, 30, 35, and 40). As an example, the R-D curve of the texture distortion and the bit rate of test sequence ‘Newspaper_CC’ are illustrated in Fig. 2.

Fig. 2.
figure 2

The relationship between distortion and bit rate of coded texture views, ‘Newspaper_CC’.

It can be observed that power functions can be used to fit the R-D points of texture video well.

$$ D_{t} \left( {R_{t} } \right) \cong \alpha_{c} R_{t}^{{ - \beta_{c} }} , $$
(2)

where \( D_{t} \) is the distortion of the coded texture views. \( R_{t} \) is the bit rate of texture views. \( \alpha_{c} \) and \( \beta_{c} \) are model parameters.

2.2 R-D Model for Synthesized Views

To find the best bit budget between the texture and depth, we also need to establish the relationship of bit rate and the synthesized views distortion. We investigate the synthesized views quality influence on the bit rate of texture video (R t ) and the bit rate of depth map (R d ).

In Fig. 3, the quality influence of texture and depth on synthesized views is investigated by changing the texture quantization parameter Q T from 5 to 45 meanwhile fixing the depth quantization parameter Q D at 24, 34, 39, 44 and 49 respectively. The quality of synthesized views (D s ) is measured in term of MSE. As shown in Fig. 3, once R t /R d is determined, the D s - R t /D s - R d relationship can be approximated as power expression.

Fig. 3.
figure 3

The R-D surface of synthesized views distortion and bit rate. ‘kendo’.

The D s - R t relationship as

$$ D_{s} \left( {R_{t} } \right) = \alpha_{t} R_{t}^{{ - \beta_{t} }} . $$
(3)

And the D s - R d relationship as

$$ D_{s} \left( {R_{d} } \right) = \alpha_{d} R_{d}^{{ - \beta_{d} }} . $$
(4)

Therefore from (3) to (4), we get the distortion model for the synthesized views as follows,

$$ D_{s} = \alpha_{t} R_{t}^{{ - \beta_{t} }} + \alpha_{d} R_{d}^{{ - \beta_{d} }} , $$
(5)

where \( D_{s} \) is the distortion of synthesized views. \( R_{t} \) and \( R_{d} \) are the bits for texture and depth. \( \alpha_{t} \), \( \beta_{t} \), \( \alpha_{d} \) and \( \beta_{d} \) are the model parameters.

2.3 A Joint Bit Allocation Based RC Scheme for 3D-HEVC

Rate control for 3D-HEVC needs to solve the bit allocation on texture/depth level, view level and frame level. The optimum bit allocation problem is to effectively distribute the bit budget between texture and depth so that the minimum views synthesis and coded views distortion are achieved. Based on the proposed coded texture views and synthesized views R-D model (5), we formulate the overall quality based optimum bit allocation as

$$ \begin{aligned} & \left( {R_{t}^{opt} ,R_{d}^{opt} } \right) = \arg \hbox{min} \left( {\alpha_{t} R_{t}^{{ - \beta_{t} }} + \alpha_{c} R_{t}^{{ - \beta_{c} }} + \alpha_{d} R_{d}^{{ - \beta_{d} }} } \right), \\ & s.t. R_{t} + R_{d} \le R_{c} . \\ \end{aligned} $$
(6)

ζ is used to represent the proportional relationship between R t and R d , defined as

$$ \upzeta = \frac{{R_{t}^{opt} }}{{R_{d}^{opt} }}. $$
(7)

Therefore, from (6) and (7), we get the objective optimization function with only one variable ζ, as shown below,

$$ \left(\upzeta \right) = \arg \hbox{min} \left( {\alpha_{t} \left( {\frac{\upzeta}{{1 +\upzeta}}R_{c} } \right)^{{ - \beta_{t} }} + \alpha_{c} \left( {\frac{\upzeta}{{1 +\upzeta}}R_{c} } \right)^{{ - \beta_{c} }} + \alpha_{d} \left( {\frac{1}{{1 +\upzeta}}R_{c} } \right)^{{ - \beta_{d} }} } \right). $$
(8)

Many optimization methods can be used to find the optimal solution of (8). In this paper, Newton iterative method is used to get the approximate optimal value. The target bit rate for the texture and depth can be expressed as follows

$$ R_{t} = R_{c} \cdot \frac{\upzeta}{{1 +\upzeta}} , $$
(9)
$$ R_{d} = R_{c} \cdot \frac{1}{{1 +\upzeta}} . $$
(10)

In order to estimate these parameters, we first encode the frames in the first GOP. Then the model parameters are calculated by the least square error method.

Based on the optimal target bit rate for the texture and depth, the bit rate ratio between the different views can be further determined by the statistical analysis. In this paper, we use anchor’s bits ratio between the base view and the dependent views to allocate the bits for different views.

After allocating the target bit rate for texture/depth level and view level, the target bit rate needs to be allocated for the different frames. The frame level bit allocation is proposed in our previous work [7] as follows

$$ R_{n,i} = \left\{ {\begin{array}{*{20}l} {R_{n}^{remain} \cdot {\kern 1pt} \phi } \hfill & {I\,frame} \hfill \\ {R_{n}^{remain} \cdot w_{i} \cdot \left( {1 - \phi } \right)} \hfill & {others} \hfill \\ \end{array} } \right., $$
(11)
$$ R_{n}^{remain} = \frac{{bit{\kern 1pt} rate}}{framerate}{\kern 1pt} \cdot N_{n} + {{\left( {R_{n - 1}^{remain} - R_{n - 1}^{actual} } \right)} \mathord{\left/ {\vphantom {{\left( {R_{n - 1}^{remain} - R_{n - 1}^{actual} } \right)} {N_{rest}^{G} }}} \right. \kern-0pt} {N_{rest}^{G} }}, $$
(12)

where \( R_{n,i} \) is the target bits for i th frame in n th GOP. \( R_{n}^{remain} \) and \( R_{n}^{actual} \) are the target and actual bits in n th GOP. N n is the numbers of n th GOP’s frames. \( N_{rest}^{G} \) is the number of the rest GOP which is not coded. \( \phi \) is a proportion of the I frame in a GOP which is recommended to be 0.4 and 0.25 respectively for the first and the rest GOPs based on experiments. w i is the weight of the frames in RA hierarchical structure getting from experience.

$$ w_{i} = \left\{ {\begin{array}{*{20}l} {0.07} \hfill & {if(POC\% 8 = = 0)} \hfill \\ {0.056} \hfill & {if(POC\% 8 = = 4)} \hfill \\ {0.0454} \hfill & {if(POC\% 4 = = 2)} \hfill \\ {0.035{\kern 1pt} } \hfill & {else} \hfill \\ \end{array} ,} \right. $$
(13)

where POC denotes Picture Order Count and represents an output order of the pictures in the video stream.

When overflow or underflow occurs, the difference between the target bits and the actual bits in a GOP will be distributed to the rest GOPs averagely.

Trade-off between the output bit rate (R) and the quality (D) of the compressed video are determined by the quantization step size (Qs), which is indexed by quantization parameter (Q). The R-Qs and D-Qs model have been studied extensively for the previous video coding standards such as H.264/AVC and HEVC. Here we use a linear model which is proposed in our previous work [7] as follows

$$ R = \alpha \times X/QP, $$
(14)

where α is the model parameter. R is the coding rate. QP is the quantization parameter. X is the complexity estimation for the current picture which is computed as following.

$$ X = \left( {\sum\limits_{i = 0}^{n} {(w_{i} \times SAD_{i} )} /\sum\limits_{i = 0}^{n - 1} {(w_{i} \times SAD_{i} )} } \right)^{1 - \lambda } \times R_{n - 1} \times QP_{n - 1} , $$
(15)

where n is the current frame number. QP n-1 is the quantization parameter of the (n-1) th frame. R n-1 is the actual bits of the (n-1) th frame. w i is defined as:

$$ w_{i} = 0.5^{n - i} /\mathop \sum \limits_{i = 0}^{n} 0.5^{n - i} $$
(16)

3 Experimental Results

To evaluate the proposed 3D-HEVC rate control algorithm, the proposed algorithm is integrated into the reference software HTM10.0. In order to evaluate the performance of the proposed RC algorithm and R-lambda algorithm is utilized for comparison. We have tested our algorithm on all of eight sequences defined in the CTCs (1024 × 768 and 1920 × 1088). Each sequence is composed of three views: the left, the center (coded first) and the right view. After coding, six synthesized views were rendered.

3.1 Control Accuracy

To evaluate the accuracy of the bit rate control, the following measurement is adopted.

$$ Error = \frac{{\left| {R_{actual} - R_{target} } \right|}}{{R_{target} }} \times 100\% , $$
(17)

where Error is the bits error. R target and R actual are the number of target bits and the actual output bits, respectively.

As illustrated in Table 1, it can be seen that the proposed RC algorithm achieves smaller mismatch between target bits and actual output bits. That is because the frame level bit allocation proposed in our previous work [10] is designed more suitable for I-SLICE instead of relying on overflow/underflow handling strategy.

Table 1. Proposed Algorithm R-D Performance Compared With Anchor And R-lambda

3.2 R-D Performance

In order to objectively evaluate the performance of the proposed RC algorithm, R-lambda algorithm proposed in [9] is utilized for comparison. In [9], the target bit rate of each texture video is set as corresponding bit rate in HTM anchor and depth maps are coded with fixed QP as the same as anchor. In the proposed algorithm, the target bit rate for all coded views’ bit rate (including three texture videos and three depth maps) is set as anchor’s total bit rate.

As illustrated in Table 1 and Fig. 4, we can see that the proposed algorithm shows much better R-D performance than R-lambda for both coded texture views and synthesized views. Based on the proposed synthesized views distortion model, the optimal bit allocation for the texture and depth is achieved. The maximum performance improvement for all views (including coded texture views and synthesized views) can be up to 14.4 % and the average BD-rate gain is 6.9 %.

Fig. 4.
figure 4

The R-D curves for all views of the proposed RC scheme compared with anchor and R-lambda ‘Undo_Dancer’ and ‘GT_Fly

Furthermore, two R-D curves are shown in Fig. 4. It can be observed the proposed RC algorithm shows much better R-D performance than R-lambda model for both high bit rate and low bit rate.

4 Conclusions

This paper has presented a synthesized views distortion model based joint bit allocation and rate control method to achieve the best overall quality for 3D-HEVC. The distortion dependency is investigated between the coded views and the synthesized views. The proposed bit allocation method is classified into three levels, namely texture/depth level, view level and frame level. Experimental are conducted on different video sequences and the results show that the proposed method can achieve much better R-D performance than other algorithms for 3D-HEVC.