1 Introduction

HEVC [20] the latest video coding standard, which nearly doubles the rate-distortion (RD) performance compared to previous video coding standard H.264 [24]. The use of more intra and inter predictive coding techniques [11] enhances the correlation of RD performance among temporal and spatial adjacent coding units, making it necessary to quantitatively measure the affect of reference coding units.

In video coding, RDO [21] is often used for frame QP selection, which can be formulated as

$$ Q^{\ast}=(Q^{\ast}_{1},Q^{\ast}_{2},\ldots,Q^{\ast}_{N})=\underset{Q}{\arg\min}\sum\limits_{i=1}^{N} D_{i}\qquad s.t.\sum\limits_{i=1}^{N} R_{i}\leq R_{A} $$
(1)

where N is the total number of encoded frames, RA is the target bitrate, Ri and Di represent the bitrate and distortion of the i-th frame, respectively, and \(Q^{\ast }=(Q^{\ast }_{1},Q^{\ast }_{2},\ldots ,Q^{\ast }_{N})\) denotes the set of optimal QP values. It can be observed from (1) that the generation of the optimal QP values is based on two conditions: one is the minimum distortion of each frame; and the other is the total number of coding bits cannot exceed the target bitrate. An effective method of solving the constrained optimization problem is the Lagrangian Multiplier Method [10], by which (1) can be converted into the following unconstrained formulation:

$$ Q^{\ast}=\underset{Q}{\arg\min} J=\sum\limits_{i=1}^{N}\left( D_{i}+\lambda\times\sum\limits_{i=1}^{N} R_{i}\right) $$
(2)

where J denotes the RD cost. It can be observed from (2) that the optimal QP value corresponds to the minimum RD cost.

However, the stronger dependency involved in HEVC makes currently independent RDO non-optimal any more. For the purpose of global algorithm optimization, some traditional works have been implemented for RDO. Work [13] and [12] proposed a QP refinement algorithm based on the relationship between λ and QP. From the view of perceptual sensitivity of human visual system (HVS), work [29] and [23] proposed adaptive λ calculation algorithms, by which the corresponding QP values were refined. However, these algorithms are usually applied for the units at different levels individually without considering the dependency among adjacent coding units at different levels to alleviate the high complexity. A typical method for optimizing the independent RDO is to substitute dependent RD models for independent ones. Work [3,4,5,6,7, 9, 14,15,16, 19, 22, 28, 30] proposed various algorithms, i.e., QP refinement, bit allocation, in terms of dependent RDO by using dependent RD models. Work [16] proposed a dependent RDO scheme for H.264 based on frame-level DQP and RQP models. Work [5] proposed two QPC algorithms based on dependent RDO for the RA configuration in HEVC. In work [6], inter-frame dependency is formulated by the importance of frame. Then a metric of just-noticeable temporal pumping artifact (JNTPA) based on characteristics of the human visual system was proposed to refine frame-level QP. Work [30] proposed an adaptive quantization parameter cascading (QPC) scheme for HEVC hierarchical coding by considering the inter-layer dependency, which is a constant value for the video sequence. Based on a self-domain (S-domain) observations, work [4] proposed empirical dependent R/D models, which was applied in a joint temporal-quality layer bit allocation algorithm formulated as a Lagrange optimization problem. Work [7] proposed an adaptive QP selection algorithm for RA coding in HEVC by involving the distortion dependency among different coding levels, where the inter-frame dependency is measured by an energy of prediction residuals based model. Work [4,5,6,7, 16, 30] all exhibited that there was a linear relationship between the predictive residual of coding units (frame-level and hierarchical coding-level) and the coding distortion of its reference units. Based on this detection, work [3] proposed a distortion propagation model and the corresponding RD model [22]. Using indirect RD modelling, work [28] proposes a temporal dependent RDO by establishing a source distortion temporal propagation model based on macroblocks, which takes inter-frame dependency into account. This model was further applied in adapting the Lagrange multiplier [19]. Work [15] proposed an adaptive QPC scheme, where the dependency between different layers was modeled by a linear factor. The optimal QPs were dynamically obtained for different layers based on the formulated RD curve. Work [14] proposed an improved model based on their previous work [15]. By using this model, a global analytical QP refinement scheme was presented, which minimized the GOP distortion under a subjective average bitrate of all frames in the GOP. The above researches are all performed in frame-level or hierarchical coding-level, which only sustain for a short time, to avoid high modeling complexity. However, the effect of inter-frame dependency will be obvious when propagated distortion is accumulated in a long temporal period. Therefore, to explore the characteristics of inter-frame dependency, more frames ought to be considered. As a state-of-art research zone, deep-learning is an useful method in building models. Work [9] proposed a linear quality dependency model between a picture and its references, which is used in bit allocation for scalable video coding (SVC). In work [25, 27], background modeling and image understanding methods are proposed based on deep-learning respectively. They provide novel methods for evaluating video content complexity. In work [26], a spatial-temporal attention mechanism for video captioning is proposed based on deep-learning. The mechanism shows superior accuracy in calculating spatial-temporal dependency, which will be helpful in exploring inter-frame dependency.

As inter predictive coding is isolated between different GOPs by I frame, inter-frame distortion propagation can only occur inside the GOP. Therefore, this paper adopts GOP as the basic unit to analyze the characteristics of inter-frame dependency. Firstly, based on intensive statistic analysis, three parameters (initial QP (\(\overline {QP}\)), the length of pictures of group, and the average of sum of absolute transformed difference (SATD) [18] of one frame, which represents the video content of the GOP) are used to formulate the relationship between ΔD and ΔQP. Secondly, the resulting rate change ΔR relative to ΔQP is also formulated similarly. Thirdly, optimized Lagrangian multiplier (λ) is calculated with these two mathematic models. Finally, we refine QP values based on the optimized λ in terms of dependent RDO. The experimental results show that the proposed frame-level QP selection algorithm can decrease BD-BR [1] by about 1.62% at the random-access (RA) configuration and 1.13% at the low-delay (LD) configuration, respectively. And it doesn’t increase complexity significantly.

The rest of this paper is organized as follows. Section 2 analyzes the causes and characteristics of inter-frame dependency in HEVC; Section 3 establishes the distortion and rate fluctuation prediction models; Section 4 proposes the QP selection algorithm based on inter-frame dependency; Section 5 presents the experimental results; and Section 6 concludes this paper.

2 Inter-frame dependency modeling

2.1 Analysis the cause of inter-frame dependency

Massive inter predictive modes are adopted in latest HEVC video coding standard to eliminate temporal redundancies [17], where Fig. 1 shows an example. In inter-prediction, current frame can reference multiple adjacent decoded frames, and a block may be referenced multiple times. These characteristics result in a discontinuous inter-frame reference chain, leading to the difficulty of analyzing and modeling inter-frame dependency accurately.

Fig. 1
figure 1

Inter-predictive Coding

To explain the cause of inter-frame dependency better, frame P1 and P2 are used in the following analysis, where P1 is the starting frame of the GOP with intra-frame coding only, and P2 is P frame which only references the reconstructed frame of P1. The mean square error (MSE) of reference error is adopted to represent temporal residual energy, and formulated as (3).

$$ MSE_{2}=E(org_{2}-rec_{1})^{2}, $$
(3)

where org2 denotes the original pixel of P2, rec1 is the pixel of the reconstructed frame of P1 and org1 is the original pixel of P1. From (3), we have

$$ MSE_{2}=E\left\{\left[(org_{2}-org_{1})+(org_{1}-rec_{1})\right]^{2}\right\}\approx{\delta_{org(1,2)}^{2}+D_{1ref}} $$
(4)

From (4), it can be observed that the predictive residual of P2 is composed of two parts. The first part is \(\delta _{org(1,2)}^{2}\) which represents the intrinsic difference between the original pictures P1 and P2 and is unrelated to inter-frame dependency. The second part is D1ref which denotes the coding distortion of P1 and propagates to P2 through inter-prediction. It is shown that the coding quality of P1 directly determines the coding quality of P2, thereby ultimately affects the RD performance of P2. From above analysis, we can conclude that inter-frame dependency is from inter-frame distortion propagation.

2.2 Analysis of inter-frame dependency

From above analysis, the inter-frame distortion propagation is proven to be the root cause of inter-frame dependency. However, the characteristics of fragmented and multi-layered in inter-prediction make it difficult to track and analyze the inter-frame distortion propagation accurately. In this section, we will provide a holistic analysis of distortion propagation. The inter-frame dependency is isolated by the first frame of GOP which is encoded with intra-prediction only. When the initial QP of a certain reference frame (\(\overline {QP_{i}}\)) is added with the offset ΔQP, there shows distortion propagations within the GOP. To reflect the overall distortion propagation, we average the distortion offset of all frames in the GOP,

$$ {\varDelta} D_{T}=\frac{1}{N}\sum\limits_{i=1}^{N}{\varDelta} D_{i} $$
(5)

where ΔDT represents the average of all frame distortion offsets of the GOP, and define it as inter-frame distortion fluctuation.

To explore the relationship between ΔDT and ΔQP, this paper statistically delineates the scatter plot and fitted curve of ΔDT and ΔQP for different sequences as shown in Figs. 2 and 3. In the experiment, the interval of instantaneous decoding refresh (IDR) frame namely GOP size is fixed at 80, each mini-GOP contains four frames by default, rate control is closed to avoid skipping some encoding models by the limited bitrate, and other parameters are set as default in the encoder_randomaccess_main configuration. The \(\overline {QP}\) of i-th frame is set to 22, 27, 32, 37 where i is set to 1 here. From work [8], it is known that the clip of ΔQP has obvious influence on coding quality and bitrate smoothness. The clip of ΔQP is limited between -5 and 5 to limit unexpected large QP fluctuation among adjacent frames for smoothing coding quality and temporal bitrate. We use the fitting tool – cftool in MatLab to fit the acquired scatter plot between ΔQP and ΔDT. It shows that compared to the nonlinear model like exponential function, the linear model performs better at a higher correlation coefficient (R) and a lower root mean square error (RMSE), which are 0.964, 0.036 and 0.994, 0.010 on average for the nonlinear and linear fitting models. It can be observed that there exists a positive linear correlation for different \(\overline {QP}\) values, and can be formulated as

$$ {\varDelta} D_{T}=f_{D}({\varDelta} QP)=K\times{\varDelta} QP $$
(6)
Fig. 2
figure 2

The Scatter plot and fitted line between ΔDT and ΔQP of Bqsquare. a Scatter plot and fitted line of QP equals to 22. b Scatter plot and fitted line of QP equals to 27. c Scatter plot and fitted line of QP equals to 32. d Scatter plot and fitted line of QP equals to 37

where K is the slope of the linear correlation between ΔDT and ΔQP. The bias of this linear correlation is zero because there is no inter-frame distortion fluctuation when ΔQP is zero. When the lager K is, the faster ΔDT changes with the ΔQP value meaning stronger inter-frame dependency between them. When the smaller K is, the slower ΔDT changes with the ΔQP indicating weaker inter-frame dependency between them. Therefore, K directly reflects the inter-frame dependency within a GOP in HEVC.

3 Establishing the distortion and rate fluctuation models

3.1 Factors affecting inter-frame distortion propagation

From above analysis, the cause of inter-frame dependency is from inter-frame distortion propagation. In Figs. 2 and 3, K varies with \(\overline {QP}\). Hence \(\overline {QP}\) is one factor that affects inter-frame distortion propagation. The video content is another factor that affects inter-frame distortion propagation during inter-predictive coding. The video content changes with the length of the sequence, so we use the average SATD of all frames in the GOP to represent the video content complexity,

$$ \varepsilon=\frac{1}{N}\sum\limits_{i=1}^{N}SATD_{i} $$
(7)

where N is the length of the GOP, ε represents the average SATD of all frames in the GOP. A comprehensive analysis of the factors affecting inter-frame distortion propagation can be represented by modeling K based on \(\overline {QP}\) and ε, namely,

$$ K=g_{D}(\overline{QP},\varepsilon) $$
(8)
Fig. 3
figure 3

The Scatter plot and fitted line between ΔDT and ΔQP of Racehorses. a Scatter plot and fitted line of QP equals to 22. b Scatter plot and fitted line of QP equals to 27. c Scatter plot and fitted line of QP equals to 32. d Scatter plot and fitted line of QP equals to 37

3.2 Inter-frame distortion fluctuation modeling

This paper investigates the relationship (8) for different video sequences. The test flow is listed as follows:

  1. 1)

    Set the initial number of coding frames to N (N is less than 80), and set the \(\overline {QP}\) of the first frame in the GOP to 18;

  2. 2)

    Add ΔQP to \(\overline {QP}\), the initial value of ΔQP being -5;

  3. 3)

    Start video coding to obtain the ΔDi and SATDi values of each frame, increasing the ΔQP value by 0.1;

  4. 4)

    If the ΔQP isn’t larger than 5, return to the third step;

  5. 5)

    Increase the value of N by 2. And if the number of frames is greater than 80, end the coding; if not, return to the first step.

In the above procedure, the \(\overline {QP}\) value and frame number are altered in order to simulate the actual distortion propagation process. The scatter plot of K with \(\overline {QP}\) and ε is shown in subplot (a), (c) of Fig. 4, where the horizontal axis represents \(\overline {QP}\), and the vertical axis is ε. The fitted results are presented below in subplot (b), (d) of Fig. 4. From (7) and (9), the ΔDT model based on K built by data fitting is

$$ {\varDelta} D_{T}=\frac{p_{1}+p_{2}\times\overline{QP}+p_{3}\times\varepsilon}{1+p_{4}\times\overline{QP}+p_{5}\times\overline{QP}^{2}+p_{6}\times\overline{QP}^{3}+p_{7}\times\varepsilon}\times{\varDelta} QP $$
(9)

where p1, p2, p3, p4, p5, p6, and p7 are all modeling parameters. The experimental results show that these model parameters vary with the video content complexity, which is reflected by ε. And we find that all the video complexity can be divided into three stages by two ε thresholds. Parameters for different video content complexity as shown in Table 1.

Fig. 4
figure 4

The scatter plot of K and fitted surface. a Scatter plot of Bqsquare. b Fitted surface of Bqsquare. c Scatter plot of Racehorses. (d) Fitted surface of Racehorses

Table 1 Model parameters for different video content complexity

3.3 Rate fluctuation prediction modeling

Following similar procedure in previous Section, we make an off-line and statistical description of the correlation between the bitrate fluctuation ΔRT and ΔQP. The results are shown in Figs. 5 and 6. There exists a negative linear correlation between ΔRT and ΔQP, which can be formulated as

$$ {\varDelta} R_{T}=f_{D}({\varDelta} QP)=H\times{\varDelta} QP $$
(10)

where H is the slope of the linear correlation. The results shown in subplot (a), (c) of Fig. 7 are obtained by using the same test procedure as the K modeling in Section 3. It is shown from Fig. 7 that

$$ H=g_{R}(\overline{QP},\varepsilon) $$
(11)
Fig. 5
figure 5

The Scatter plot and fitted line between ΔRT and ΔQP of Bqsquare. a Scatter plot and fitted line of QP equals to 22. b Scatter plot and fitted line of QP equals to 27. c Scatter plot and fitted line of QP equals to 32. d Scatter plot and fitted line of QP equals to 37

Fig. 6
figure 6

The Scatter plot and fitted line between ΔRT and ΔQP of Racehorses. a Scatter plot and fitted line of QP equals to 22. b Scatter plot and fitted line of QP equals to 27. c Scatter plot and fitted line of QP equals to 32. d Scatter plot and fitted line of QP equals to 37

Fig. 7
figure 7

The scatter plot of H and fitted surface. a Scatter plot of Bqsquare. b Fitted surface of Bqsquare. c Scatter plot of Racehorses. d Fitted surface of Racehorses

In this paper, the model of ΔRT based on H built by data fitting is

$$ {\varDelta} R_{T}=\left\{H_{0}+A\times \exp\left\{-\frac{1}{2}\left( \frac{\overline{QP}-b_{1}}{w_{1}}\right)^{2}-\frac{1}{2}\left( \frac{\varepsilon-b_{2}}{w_{2}}\right)^{2}\right\}\right\}\times{\varDelta}{QP} $$
(12)

where H0, A, w1, w2, b1, b2 are all modeling parameters. The subplot (b), (d) of Fig. 7 show the fitted surface. Like the model parameters in formula (9), the parameters in (12) also vary with the video content which is classified by ε. The fitted model parameters for different video contents are tabulated in Table 2.

Table 2 Model parameters for different video content complexity

4 Proposed method

In HEVC video coding, the relationship between QP and λ is:

$$ \lambda=-\frac{\partial D}{\partial R}\approx q_{factor}\times{2^{(QP-12)/3}} $$
(13)

where qfactor denotes the parameter related to the encoder structure, and its default value is 0.85. From (13), we can derive QPi by:

$$ QP_{i}=3\log_{2}\left( -\frac{1}{q_{factor}}\times\frac{\partial{D_{i}}}{\partial{R_{i}}}\right)+12 $$
(14)

It is shown that the QPi is determined by λi = −Di/Ri. In previous sections, both the distortion and the bitrate fluctuation are based on inter-frame independence. From (6), (10) and (13), we can have:

$$ \lambda_{i}^{\ast}=-\frac{\partial D_{i}}{\partial R_{i}}=-\lim_{{\varDelta} QP \to 0}\frac{{\varDelta}{D_{i}}}{{\varDelta}{R_{i}}}=-\lim_{{\varDelta} QP \to 0}\frac{f_{D}({\varDelta}{QP_{i}})}{{\varDelta}{QP_{i}}}/\frac{f_{R}({\varDelta}{QP_{i}})}{{\varDelta}{QP_{i}}}=-\frac{K_{i}}{H_{i}} $$
(15)

Substituting (15) into (14), we obtain the following QP value:

$$ \widehat{QP}_{i}=3\log_{2}\left( -\frac{1}{q_{factor}}\times\frac{K_{i}}{H_{i}}\right)+12 $$
(16)

Here \(\widehat {QP}_{i}\) represents the frame QP value calculated by inter-frame dependency. In order to avoid the large fluctuation of video coding quality, this paper adopts the weighted sum of \(\overline {QP}\) and \(\widehat {QP}\) to obtain new frame QP value,

$$ QP_{i}=\overline{QP_{i}}\times w+\widehat{QP}_{i}\times{(1-w)} $$
(17)

where w ∈ [0,1].

The weight coefficient w, affects the final frame QP value as in (17). This paper statistically analyzes the BD-BR performance according to different w values. The results are shown in Fig. 8, where the BD-BR shows best when w is around 0.53. Therefore, we set the value of w to 0.53.

Fig. 8
figure 8

BD-BR under different w. a The fitted curve of Bqsquare. b The fitted curve of Racehorses

5 Experimental results

The platform used in this paper is HM-16.0. To evaluate the accuracy of proposed K and H prediction model, we statistically analysis the predictive error of these two models in Fig. 9. It is shown that more than 95% of prediction errors are concentrated in the range of zero-centered [-0.04, 0.04]. The error statistics of the rest sequences are shown in Table 3, which demonstrates that the proposed models can make an accurate prediction of K and H.

Table 3 Prediction accuracy statistics
Fig. 9
figure 9

Predictive error statistics histogram. a Predictive error of Bqsquare. b Predictive error of Racehorses

Two other related approaches are also tested for the coding efficiency comparisons, which are Li’s [14] and the QP cascading (QPC) algorithm [30]. The rate-distortion performance is evaluated by BD-BR and Bj⊘tegaard Delta PSNR(BD-PSNR), where negative value of BD-BR or positive value of BD-PSNR represent performance gains. The coding results of the three algorithms are shown in Table 4, where the default RA configurations [2] in HM is used as the benchmark. It can be seen from Table 4 that Li’s algorithm increases BD-BR by 5.86%, QPC reduces BD-BR by 1.09%, and the proposed algorithm reduces BD-BR by 1.62% on average, which indicates that the proposed algorithm can effectively reduce the number of coding bits, with the video coding quality being the same.

Table 4 Coding performance comparison

In addition, it is shown that the performance of the proposed algorithm varies with the video content. The average BD-BR performance of Class A and D is better than the other sequences by using the proposed algorithm, like Traffic, PeopleOnStreet and BlowingBubbles sequences. This is because there is less motion in Class A and D. The coding blocks can easily get optimal reference blocks from adjacent frames instead of multi frames in the list of reference frames. Therefore, the reference frames of Class A and D are referenced at a higher frequency, which means higher inter-frame dependency. Conversely, there is dramatic foreground and background motion in the sequences of Class B, C and F. The reference frames are more dispersive, resulting in weaker inter-frame dependency with unnoticeable performance improvement. Since the parameters used in the proposed models can be obtained from off-line preprocessing, the proposed algorithm increases the computational complexity insignificantly. During the evaluation of the proposed algorithm, we find that the additional time incurred by the proposed algorithm is less than 0.5s, which can be ignored in practice.

Compared to [14, 30], the proposed algorithm performs better. From Table 4, it can be seen that Li’s algorithm only works well for sequences with slow motion. That is because the parameters updating in Li’s algorithm is depending on the parameters of the latest former GOP, which can’t reflect inter-frame dependency of the current GOP for those sequences with fast motion. In the QPC, the inter-frame dependency is only illustrated by a static parameter δ for all video sequences, which makes the algorithm non-adaptive for different video content and limits the algorithm performance. In the proposed algorithm, the model parameters are divided into three stages depending on the average video content complexity, which makes the model become universal for various video content. When the proposed algorithm is employed in LD configuration, the performance decrease at about 0.5% with BD-BR. The root reason is that LD is a forward-reference encoding structure, which means that encoding error can only propagate to the current frame from one single direction. However, RA is a bi-directional reference encoding structure, the encoding of the current frame is related to more frames, which means the inter-frame dependency in RA is stronger than that in LD. It also reflects that the proposed algorithm can exploit the inter-frame dependency better and get a better performance.

6 Conclusion

In this paper, we have proposed a frame-level QP selection algorithm based on inter-frame dependency. Firstly, the distortion fluctuation model between ΔD and ΔQP is established to illustrate the inter-frame distortion propagation. Then, the rate fluctuation model between ΔR and ΔQP is also formulated in a similar way. With these two mathematic models, we determine the Lagrangian multiplier (λ) in terms of dependent RDO. Finally, with the optimized λ, we optimally refine the QP in the sense of dependent RDO. The experimental results show that the proposed frame-level QP selection algorithm can decrease BD-BR by about 1.62% at the RA configuration and 1.13% at the LD configuration respectively. And it doesn’t increase complexity significantly.