Keywords

1 Introduction

Back to several years ago, traditional computers brought us to the Internet by a display with a pointing and input device. Users connect with a two-dimensional cyberworld by passing through this screen. However, we step outside of this artificial creation far away because it only exists inside of the computer [1]. With the development of technology expands rapidly, Virtual Reality (VR), as a high potential future technology, has become one of the most popular research fields. It mainly incorporates the simulation environment, perception, natural skills and sensing devices, which users can construct and interact a virtual, three-dimensional world generated by a computer simulation system. Meanwhile, various VR applications in disparate areas desire to contribute a great immersive experience for users, for the moment, the human does not just remain farther away but fully walk into the inside of this digital world.

While the technology of computer simulation has developed briskly, the application scenarios of VR and related productions evolving the global marketplace constantly. VR 360 technology has a wide range of impacts applying varied industries. Contrary to simple data transaction and computation, as the advancement of these technologies keep improving, some of them begin to request a real-time interaction with the users [2]. For example, VR game is assuredly the most popular and widespread area in VR entertainment, the popularity of VR game developed the degree of familiarity for users design more applications [3]. Some others are also popular, such as VR museum can lead users to remain within doors but visit any museum in the world [4]. VR tools can help designers create and review their design changes in order to improve work efficiency [5]. VR fitting room can more convenient to help users take chances for wearing and matching outfits when they shopping online in home [6, 7]. More importantly, for those doctors-in-training in the medical industry or soldiers in the military, the practical experience of training in a controlled virtual environment, such as doing surgery and shooting, can lower the risky and dangerous in actual operations and wars [5].

Traditional simulation method is more like cinematic, which is based on simulate the used cases, the condition of those events made by computer systems, but normally users is only view the scenarios but not interact directly inside of the environment. For the most part, Unlike the computer and mouse keyboard as input devices to the traditional two-dimensional simulation system, the presentation of VR 360 video requires users to wear a head-mounted display (HMD) device and applied to two gravity sensor controllers or gloves, so that users’ activity can reflect synchronous on HMD. Different from the traditional two-dimensional computer display, HMD would be temporarily blocked the connection between users and the real world for reducing the distraction and presenting a illusion of the lifelike new reality [8, 9]. Based on that, users would be fell into surrounding either a portrayal of the real reality or an imaginary world made by designers.

VR 360 video technology offering a new way to lead human enable to interact with the digital world deeply, be more specific, to enhance users’ immersive experience, the video coding for VR becomes a crucial part, since considering the image resolution, the range of pixels and the frame rate of VR 360 video are significantly greater than two-dimensional normal video. Each video should reach the users’ devices shortly and are usually played in a batched manner. In every few seconds, the media player would go through the buffer to extract video data for the next interval [2]. Despite the wide benefits of VR video technology, how to improve the accuracy of video coding to raise the compression efficiency for VR video progressively becoming a new challenge in the industry. Especially with spherical projection of VR 360 latitude, due to video encoders are generally designed for two-dimensional images but not considered the attribute of information sources for VR 360 latitude pictures [10], the efficiency of coding accuracy can effect the projection practically.

Virtual reality video sequences are stored in the storage medium as a longitude and latitude map. In addition, the visual effect would be presented on the spherical surface as 360° surround in real-time when playing the video contents by VR video device. The process of mapping from the longitude and latitude image to the spherical surface would be emerged a seriously deformity compression except for the pixels on the equator. Furthermore, higher the latitude leads more intensity for compression. For example, the point of latitude 90° from the longitude and latitude image mapping to a pixel point on the spherical surface, there is always exists the problem about pixel-compression. Hence, according to the property of the longitude and latitude map, further to adjust Quantization Parameter (QP) can benefit for the whole performance improvement of coding VR 360 latitude picture.

In this paper, we considered the relationship between fixed QP value and block-level Lagrangian multiplier by a spherical projection of VR 360 latitude picture, and proposed a new method by using this relationship to determine the QP value in block-value to optimize the compression efficiency in actual VR video coding. Related works and backgrounds are shown in Sect. 2. Section 3 describes our fixed QP method for optimizing. Experimental results for this methodology are given in Sect. 4, and we have a conclusion in Sect. 5.

2 Related Works

2.1 Encoding Mode

There are many other solutions and methods proposed in these years to improve VR 360 video technology. Some works discuss about different encoding scheme, such as adaptive VR 360 streaming [11], authors partially segregation 3D mesh into multiple 3D sub-meshes, and construct a hexaface sphere which to optimally represent tiled VR 360 videos in the 3D space. Specially, they extend MPEG-DASH SRD to the 3D space of VR 360 videos and provide a dynamic adaptation technique to confront the bandwidth demands of VR 360 video streaming. While users watching VR video on HMDs, they only view a small local area in whole spherical image, hence, they separated the video projection into multiple tiles while encoding and packaging, and used MPEG-DASH SRD to apply the dimensional relationship of each tile in the 360° space. Although the results of the experiment are positive, the lowest representation on peripheral meshes are not always immediate results in visual changes, it still not capable enough for VR practical application.

2.2 Quantization Parameter Allocation

A similar work discusses about Scalable High Efficiency Video Coding, it applied to the panoramic videos and project a scene from a spherical shape into one or several two-dimensional (2D) shapes. Authors bring a scalable full-panorama video coding method to adapt the insufficient bandwidth, which mapped and coded in high quality of users’ preference part of video firstly. In VR technology, in order to present an almost perfect VR appearance, it requires video coding with high resolution and low delay transmitted to VR application. Users’ viewpoint is broadcasted from HMDs to the encoder, and be restrict by their view angle, so the high quality of video will be encode priority for the region, which users are more interested in while the other will be encode with low quality. In this paper, the experiment result are impressive but the proposal has its limitations, since it is focused on the region-of-interest (RoI), PSNR comparison of the equator, middle and pole parts on the two-dimensional map between Scalable High Efficiency Video Coding Test Model (SHM) and High Efficiency Video Coding Test Model (HM) can reflect its effects on the quality of RoIs more clearly, which cannot be efficiency at all.

2.3 Evaluation Index

Some other works are considered about one of typical projection, which is equirectangular projection [12], it is using the grid in same degree of longitude and latitude to stretch out the spherical projection, the proposed method is efficiency but imprecisely because encoders are not able to coding for all coding units with different locations, even PSNR remains an acceptable level it still might result in an atrocious performance of SPSNR or WSPSNR. The objective evaluation index for VR video sequence, for instance, SPSNR and WSPSNR, are corresponded to the idea of increasing the latitude in order to decreasing pixel weights. Be specific in SPSNR, when the longitude and latitude image mapping on the spherical surface, the pixels decrease from equator to two poles. Meanwhile, in WSPSNR, the computation of pixels weights for the longitude and latitude image shown that equator has heavy weights whereas it becomes less toward two poles.

2.4 Rate-Distortion Optimization

There is a related work to use rate-distortion optimization (RDO) to make progress on distortion in spherical domain [13]. It firstly mapping the spherical video into a two-dimensional image before encoding, then encoding unfolded 360° videos by compute giving disparate weights from pixel values. However, the improvement from the adjust Lagrange multiplier λ in the article is still not accuracy enough at all.

Rate distortion optimization is the key core technology in the process of VR video coding, which is supported by rate distortion theory, for seeking the greatest lower boundary of recover sources under the given distortion conditions. In this article, concrete to the realization process of video coding, we alternate RDO problems to set parameters in the framework, in order to use the minimum bit-rate under the qualification distortion. Although the method of exhaustion could be find the optimal parameter by traversal all possible coding parameter sets, the high time complexity and a waste of time is barely using in the actual utilization of coding. Due to video coding is based on block-level as its coding unit and the coding parameter for each unit is independent. Hence, it can be defined as the optimal coding parameter for each coding unit belongs to the set of optimal coding parameters in the process of coding, which means the problem of global optimum can be separated as several local optimum problems.

While Lagrangian multiplier λ bringing into the process of RDO, unconstrained optimization problems in video coding will be turned into constrained optimization problems. Moreover, RDO video coding will be equipped with practical values in VR 360 video technology since Lagrangian optimization method has introduced into solving RDO problems. Due to its low complexity and high performance, RDO technology based on Lagrangian multiplier has been widely using in mainstream encoders such as H.264/AVC and HEVC/H.265. The feature of RDO with Lagrangian optimization is choosing to cost the minimum schema and parameter J = D + λR as its eventual coding output during the coding process. In the formula, D represents distortion, R represents bit-rate and λ represents Lagrangian multiplier. λ is relys on the derivation of formula for high bit-rate hypothesis, and it will be revised by empirical value in different encoders. Thus, the selection of λ has a direct relationship to the quality of video coding performance.

3 VR 360 Encoding Optimization Scheme

Different from traditional video sources, VR 360 video possesses unique characteristics. The pixel distributed in a longitude-latitude picture has latitude based variant weight when doing spherical projection for VR display. Therefore, a distinctive quality evaluation metric, SPSNR is wildly used for the coding performance assessment. It is obvious that the encoding process of the VR 360 video could be different from that of the traditional videos. Although some QP allocation approaches have been introduced to deal with the VR 360 encoding, nevertheless, they hardly achieved the BD-Rate gain due to the block-level delta-QP al technique did not reach the optimal performance level in the latest reference software RD19.0 with VR extension. To obtain compression efficiency without the effect of delta-QP, this proposal provides a latitude based Lagrange multiplier (lambda, λ) optimization scheme to reallocate CTU-level λ value.

At present, the objective quality assessment of the panoramic video is still predicated on Mean Square Error (MSE) in traditional distortion pixel error. The distortion computational process of VR 360 latitude picture is no longer calculate MSE point-to-point in a two-dimensional image but consider about the averaged value calculation in a three-dimensional spherical surface with valid area-equivalence meaning. Therefore, the process of Rate Distortion Optimization (RDO) in VR 360 latitude picture should be revise in order to correspond to new RDO calculating rules, since it matched accumulated distortion in the same areas on the spherical surface. Moreover, due to the longitude mapping ration from the spherical surface to VR 360 latitude picture is 1:1, so the ratio relationship between the ring area of spherical surface and the pixel area of VR 360 image is counted to latitude direction exclusively. Figure 1 shows the pixel proportion of spherical projection.

Fig. 1
figure 1

Spherical projection of VR 360 latitude picture

In Fig. 1, the latitude can be expressed by zenith angle. Consider a narrow ring belt of pixels with zenith angle θ, the angle difference of upper and lower bound of pixel ring belt is dθ. The height of the ring belt hring can be approximated by Eq. (1).

$$ h_{ing} = r \cdot \sin \theta $$
(1)

where r is the projected sphere radius.

Then the square measure of the ring belt with zenith angle θ is

$$ S(\theta ) = 2\pi \cdot r \cdot \sin \theta \cdot h_{ring} = 2\pi \cdot r^{2} \cdot \sin \theta \cdot \sin d\theta $$
(2)

and the corresponding square measure of the pixel ring belt at original VR 360 frame is

$$ S_{org} (\theta ) = S\left( {\frac{\pi }{2}} \right) = 2\pi \cdot r^{2} \cdot \sin d\theta $$
(3)

where \( S_{org} (\theta ) \) denotes the square measure of the ring belt with zenith angle \( \theta \), which equals to that of the equator ring belt \( S_{org} (\theta ) \).

Therefore, after spherical projection, the square proportion of such ring belt \( \theta \) is the ratio of square measure as Eq. (4).

$$ \frac{S(\theta )}{{S\left( {\frac{\pi }{2}} \right)}} = \sin (\theta ) $$
(4)

The spherical PSNR evaluation adopts the similar to pixel proportion. It samples the pixels according to the projection pixel density. Thus, the bit-rate allocation on VR 360 video coding could correspond to the distribution of square proportion, which would evidently reduce the bit-rate while preserving subjective and objective quality. Considering the square measure proportion, a reasonable bit-rate ratio assumption is

$$ \frac{{{\text{R}}(\theta )}}{{{\text{R}}\left( {\frac{\pi }{2}} \right)}} = \sin \theta = \frac{S(\theta )}{{S\left( {\frac{\pi }{2}} \right)}} $$
(5)

where \( {\text{R}}(\theta ) \) and \( {\text{R}}\left( {\frac{\pi }{2}} \right) \) are the corresponding bit-rate at \( \theta \) and the equator. With such assumption, the lambda ratio factor can be derived from the following steps.

According to the article from Zhou [14], the typical R-QP model is

$$ R = \alpha \cdot e^{ - \beta \cdot QP(\theta )} $$
(6)

where \( \alpha \) and \( \beta \) are the model parameters which related to source characteristics and QP(\( \theta \)) is the QP value in block-level when the angle is \( \theta \) degree.

Thereafter, the bit-rate ratio can be further expressed by substituting Eq. (6) to Eq. (5) as

$$ \frac{{{\text{R}}(\theta )}}{{{\text{R}}\left( {\frac{\pi }{2}} \right)}}{ = }\frac{{e^{ - \beta \cdot QP(\theta )} }}{{e^{{ - \beta \cdot QP\left( {\frac{\pi }{2}} \right)}} }}{ = }\sin \theta $$
(7)

where QP(\( \pi /2 \)) is the QP value in block-level at the equator.

Based on Eq. (7), we can get through the formula derivation to obtain the difference values as Eq. (8).

$$ - \frac{(\ln \sin \theta )}{\beta } = QP(\theta ) - QP\left( {\frac{\pi }{2}} \right) = QP(\theta ) - QP_{sys} $$
(8)

where QPsys represents the QP value for general frame allocation in current system.

Hence, the QP value in block-level when the angle is \( \theta \) degree can be calculated by Eq. (9).

$$ {\text{QP}}(\theta ) = QP_{sys} - \frac{\ln (\sin \theta + \varepsilon )}{\beta } $$
(9)

where the range of values for \( \beta \) is 0.35 and \( \varepsilon \) is a minimal value, which value range is 0.08, in case \( \sin \theta { = }0 \) to make \( \ln (\sin \theta ) \) becomes infinity.

Therefore, according to [15], we can obtain the Eq. (10).

$$ \lambda (\theta ){ = }0.85 \cdot 2^{{\frac{{{\text{QP(}}\theta ) { - 12}}}{3}}} $$
(10)

where 0.85 is determined, which is a pre-configured empirical value of the encoder, given in terms of encoding frame type, specification, group of picture (GOP) that frames place in. constant 3 and 12 are invariable in H.246 and H.265.

According to \( \lambda (\theta ) \) and \( \lambda_{\text{sys}} \), we can obtain the modification value of the fixed QP by Eq. (11).

$$ \Delta {\text{QP}}(i) = K \cdot \log_{2} \frac{{\lambda (\theta_{i} )}}{{\lambda_{sys} }} $$
(11)

where the coding unit block of ith for the quantization parameter \( {\text{QP(i)}} \) is given by Eq. (12).

$$ {\text{QP}}({\text{i}}){\text{ = QP}}_{sys} (i) + \Delta QP(i) $$
(12)

where \( {\text{QP}}_{\text{sys}} (i) \) represents the system default value and \( \Delta {\text{QP}}({\text{i}}) \) is the modification value of the fixed QP by Eq. (11).

Algorithm 1: Quantization Parameter correction of VR video coding in block level

Require: The coding unit of the current frame

Ensure: The block level of ith row for quantization parameter \( {\text{QP(i)}} \) and the modified Lagrange multiplier \( \lambda \left( \theta \right) \)

        Step 1: Record the zenith angle \( \theta \) and the correction factor sin \( \theta \) by Eqs. (7) and (8).

        Step 2: Obtain the Quantization parameter value for general frame allocation in current system \( {\text{QP}}_{\text{sys}} \)

        Step 3: Obtain the Quantization parameter value in current angle \( {\text{QP(}}\theta ) \) by Eq. (9)

        Step 4: Obtain Lagrange multiplier in current angle \( \lambda (\theta ) \). Record the current block’s row number i on the latitude map.

        Step 5: The modified Lagrange multiplier \( \lambda (\theta_{\text{i}} ) \) is calculated according to Eq. (10).

        Step 6: The corrected quantization parameters \( {\text{QP(}}\theta_{\text{i}} ) \) is calculated according to Eqs. (11) and (12).

return \( {\text{QP(}}\theta_{\text{i}} ) \)

As the algorithm shown above, step 1 obtain the zenith angle \( \theta \) and the correction factor \( \sin (\theta ) \) by Eqs. (7) and (8), step 2 to get the Quantization parameter value \( {\text{QP}}_{\text{sys}} \), which is general frame allocation in current system. Step 3 according to Eq. (9), Quantization parameters value \( {\text{QP(}}\theta ) \) in current angle can be calculated by \( {\text{QP}}_{\text{sys}} \), and constant parameter. Step 4, record the coding unit in block level of ith row, and based on the Quantization parameter value \( {\text{QP(}}\theta ) \) in step 3 and given constant parameters to obtain the Lagrange multiplier in current angle \( \lambda (\theta ) \). Step 5 get the modified Lagrange multiplier \( \lambda (\theta_{\text{i}} ) \) is calculated \( {\text{QP(}}\theta ) \) in Eqs. (9) and Eq. (10). Step 6 output the correct quantization parameters \( {\text{QP}}(\theta_{\text{i}} ) \).

4 Experimental Result

The quality of the two-dimensional video coding is generally evaluating by BD-Rate and BD-PSNR, which they both collecting the objective quality Peak Signal to Noise Ratio (PSNR) and the bit-rate of test points, then using integral indifference operation by connecting with high-order interpolation. Comparing with normal video and VR 360 video, although the bit-rate for coding output is no ambiguity, there is a quite disparity in the objective quality PSNR. Depending on the specificity of VR presentation format, it is not directly displayed on HMDs but transferred on a spherical surface first. Thus, two-dimensional PSNR cannot be described the objective quality in three-dimensional spherical surface precisely. Therefore, Spherically uniform Peak Signal to Noise Ratio (SPSNR) and Weighted Spherically Peak Signal to Noise Ratio (WSPSNR) become the new commonly used evaluation index models for VR 360 video today.

Tables 1, 2 and 3 show the performance gain of the present invention under three test configurations.

Table 1 BD-rate for low-delay (LD) coding
Table 2 VR360 RA(B3)
Table 3 VR360 RA(B7)

Table 1 shows the performance gain of the present invention in a low delay (LD) configuration.

In Table 1, the objective quality has different degrees of gain under the three different evaluation modes of PSNRY, SPSNR and WSPSNR. In particular, gains of 5.2 and 5.2% for SPSNR and WSPSNR focusing on the quality of VR experience, respectively. The time complexity of 96% for the present invention indicates alignment with the Anchor case with no additional computational overhead.

Table 2 shows the performance gain of the present invention when the bi-directional b-frame is configured to 3 under random access (RA).

In Table 2, the objective quality obtains different degrees of gain under the three different evaluation modes of PSNR-Y, SPSNR and WSPSNR. The gains in SPSNR and WSPSNR reach 7.99 and 7.97% respectively, which is higher than the 6.18% gain of PSNR-Y. The time complexity of the present invention of 98% indicates that the method of the present invention saves 2% of the time overhead.

Table 3 shows the performance gains of the present invention when bi-directional b-frames are configured to 7 under random access (RA).

In Table 3, the objective quality obtains different degrees of gain under the three different evaluation modes of PSNR-Y, SPSNR and WSPSNR. In all eight test sequences, all three PSNR statistics gain full gain. The gains in SPSNR and WSPSNR are 13.02 and 12.94%, respectively, higher than the 11.23% gain of PSNR-Y. The time complexity of the present invention of 98% indicates that the method of the present invention saves 2% of the time overhead.

5 Conclusion

In this paper, based on providing a QP optimization algorithm, we improve the encoding performance of VR 360 videos. Our experimental results cross-checked by Samsung Electronics, and demonstrate that based on SPSNR metric, the proposed scheme obtains significant BD-rate gain. The proposed technology is able to avoid the BD-rate loss from delta-QP scheme in AVS RD software, while effectively promoting the coding performance.