1 Introduction

In VR system, multiple cameras capture 360-degree real-world scene and then use image rendering to restore the scene. After the collected spherical information has been rendered, people can enjoy the video immersively by using the head-mounted display. Therefore, the high frame rate and high resolution required by 360-degree video, which brings great challenges to storage capacity and transmission bandwidth. Therefore 360-degree video needs a more efficient compression algorithm. In order to handle and compress the contents of these spherical regions, we need to map the spherical content information into a 2D image before encoding. A variety of mapping methods have been proposed at JVET conferences, such as ERP projection, CMP projection [2], SSP projection, ECP projection and so on. Projected content can be handled by using a standard video coding framework such as HEVC / H.265 [18, 22]. However, projected image introduces much different levels of distortion according to coordinates [24], which makes the rate-distortion optimization preformance in video coding much inefficiently.

In order to solve this problem, people use some soultions such as adaptive QP compensation to improve the efficiency of coding. Spherical images have uneven sampling characteristics after being projected onto a 2D image, such as ERP projection format. By analyzing the characteristics of ERP projection, it has denser sampling at pole areas, which results encoder spends more bits in those regions for constant QP configuration. This typically results in quality of viewports of the pole areas being higher than the quality of viewports in the equator area. Therefore, the adaptive QP encoding approach applies coarser QP to blocks that are closer to the pole areas as the pole areas have higher sampling density due to the projection transformation. The non-uniform sampling property of projection schemes for 360-degree videos can be exploited to improve compression efficiency of 360-degree videos by modulating the distortion according to the sampling density.

For reducing the coding time and complexity, people make use of CU partition and intra mode selection. In HEVC, three coding units are used in coding. These are Coding Unit (CU), Prediction Unit (PU) and Tansform Unit (TU). Relative to H.264 fixed-size macroblocks, HEVC provides CU unit sizes from 64 × 64 to 8 × 8, with depths corresponding from 0 to 3, respectively. As shown in Fig. 1, the splitting and pruning of quadtree structure of CTU is as follows.

Fig. 1
figure 1

the partition and pruning of CTU

Intra mode selection: The first step is to perform coarse selection, that is to calculate the low complexity rate distortion cost of 35 types (DC, PLANAR and 33 kinds of angular modes) and select a set of the most possible candidates to fill in the initial list, the number of candidates is depend on the PU’s size as determined in Table 1. It then determines whether the most probable mode (MPM) of its neighboring blocks is in the initial list and adds it if not. Then, we calculate RDO for the modes in the list in turn, and finally get prediction modes.

Table 1 num of PU candidate

2 Related work on fast hevc encoding

Currently concerning intra encoding compression algorithms are mainly composed of CU fast partition and PU mode selection. In CU fast partition, the optimization is mainly made from premature termination of CU partition based on prediction of depth range. In PU mode selection, the optimization is mainly performed in 35 prediction modes and MPM modes. As for the early termination of CU partition and depth skip, Shen et al. [21] proposed a fast CU size decision algorithm for HEVC intra coding to speed up the process by reducing the number of candidate CU sizes required to be checked for each treeblock. Shen et al. [20] proposed a fast CU size decision algorithm to determine CU depth range (including the minimum depth level and the maximum depth level) and skip some specific depth levels rarely used in the previous frame and neighboring CUs. Kim et al. [11] proposed an efficient CU determination algorithm to use spatial and temporal information in which 13 neighboring coding tree units (CTUs) are defined. Lee et al. [13] proposed a CU size decision algorithm based on statistical analysis to speed up the intra coding process from three aspects: SKIP mode skipping, CU skipping ahead of time and CU premature termination. Min et al. [16] proposed a novel fast algorithm for the CU size decision in intra coding. Both the global and local edge complexities in horizontal, vertical, 45 degrees diagonal, and 135 degrees diagonal directions are proposed and used to decide the partitioning of a CU. Cho et al. [6] proposed a fast CU splitting and pruning method for HEVC intra coding. The proposed fast splitting and pruning method is performed in two complementary steps: (1) early CU split decision and (2) early CU pruning decision. In selection of PU fast mode: Zhang et al. [28] proposed the Hadamard cost-based progressive rough mode search to selectively check the potential modes instead of traversing all candidates. Zhang et al. [29] analyzed the relation between a block’s texture characteristics and its best coding mode to further develop an adaptive strategy based on the analysis for fast mode decision in intra coding for HEVC. Zhu et al. [32] proposed a novel intra prediction modes pruning method based on decision trees and a new three-step search algorithm, aiming at achieving higher encoding efficiency compared to the standard-HEVC. Liu et al. [15] proposed an adaptive mode decision algorithm based on texture complexity and direction for HEVC intra prediction. Zhang et al. [30] proposed a fast intra CU decision algorithm based on the texture characteristics of video and refined the partition results by considering the coding bits of each CU as auxiliary information. However, these algorithms were mainly used for natural sequence and didn’t take into account the characteristic of 360-degree video, which has higher resolution and uneven sampling distribution after being projected onto a 2D image.

This paper proposes three methods to improve the intra coding compression performance. First, we uses adaptive QP compensation [23] based on the y coordinate of the pixel to improve compression efficiency of 360-degree video. Second, we apply premature termination of CU partition based on spatial correlation to reduce encoding time. Then we expand the pole areas of judgement condiction based on the weight of WS-PSNR in ERP projection to futher speed up encoding process. Third, in order to further decrease coding complexity, we make use of prewitt operator to predict mode directly and reduce unnecessary prediction modes through RDO cost analysis.

3 Proposed algorithm

In this paper, first of all, according to the uneven sampling characteristic of ERP projection, adaptive QP compensation is basically performed at pole areas with severe stretching, modulating the distortion and improving compression efficiency of 360-degree video. Then, in order to decrease encoding time, the premature termination of CU partition based on the prediction of depth range algorithm [9, 27] is proposed, which can skip unnecessary depth traversal. In addition, considering the weight near pole areas is very small in ERP projection, which means the distortion of pole areas has very little impact on the whole under the assessment of WS-PSNR. Therefore we can expand the judgement conditions of these regions to further decrease the coding time. Finally, to further reduce encoding complexity, we handle separately according to the size of PU block in 35 kinds of intra prediction modes [12]. For example, 4 × 4 PU block uses prewitt operator to predict mode directly and 8 × 8, 16 × 16, 32 × 32 and 64 × 64 PU block decrease encoding complexity by adaptive mode selection in RMD process.

3.1 Improved adaptive QP compensation

The projection of 360-degree video has a characteristic of uneven sampling in 2D image. For example, 360-degree video with equal rectangular (ERP) projection has more dense samples at pole areas, which spends more bits in those regions for constant QP configuration. Therefore, the adaptive QP encoding approach applies coarser QP to blocks that are closer to the pole areas as the pole areas have higher sampling density due to the projection transformation.

In addition, in order to control the video quality, adaptive QP can not be too coarse for optimization of the pole areas. In this paper, by analyzing WS-PSNR weight w, we find it is suitable for ERP projection scheme and we can make further change based on it. Therefore, we modify the QP value by updating the following formula (1):

$$ {QP}_{new}= QP-3\times {\log}_2\left({w}_{\mathrm{new}}\right) $$
(1)

Compared with the existing methods [14], the QP in this paper will decrease in the equatorial region and increase in the pole areas instead of simply increasing the QP for reducing bits that may sacrifice the video quality. Because the scope of the newly defined variable wnew is from 0.09478 to 1.56698 that corresponds from the pole to the equator. That is to say, when in the pole areas, the variable log2(wnew) is less than 0 and makes the QPnew greater than the QP, when in the equatorial areas, the variable log2(wnew) is greater than 0 and makes the QPnew less than the QP. That is why this algorithm can divide the equator precisely and partition the pole areas roughly respectively.

The weight used by WS-PSNR is as follows:

$$ w\left(i,j\right)=\cos \left(\left(j-\frac{N}{2}+\frac{1}{2}\right)\times \frac{\pi }{N}\right)\kern0.75em $$
(2)

where N is the height of a CTU and j represents the y coordinate of the pixel in a frame (j is from 0 to the frame height). This algoritm is based on the weight of WS-PSNR and is calculated in each CTU. The calculation process is as follows:

  1. (1)

    Define Num_of_Ctu_Height as the quantity of CTUs in a column and calculate each CTU the sum of its weights and take the average:

$$ {w}_{Index\_\mathrm{o}f\_ Ctu\_\mathrm{Average}\_ Weight}\left(i,j\right)=\frac{1}{N}\sum \limits_{\mathrm{i} ndex\times N}^{N-1+ index\times N}w\left(i,j\right)\kern0.75em $$
(3)

Among them, the index is the serial number of a CTU in a column and its range is [0, Num_of_Ctu_Height] and N represents the height of a CTU.

  1. (2)

    Calculate the sum of the average of each CTU weights in a column in the first step and define it as:

$$ Total\_ Ctu\_ Average\_ weight=\sum \limits_0^{Num\_ of\_ Ctu\_ Height}{w}_{Index\_ of\_ Ctu\_ Average\_ Weight}\left(i,j\right)\kern0.75em $$
(4)
  1. (3)

    This paper modifies the weight as follow:

$$ {w}_{new}=\frac{w_{Index\_ of\_ Ctu\_ Average\_ Weight}}{Total\_ Ctu\_ Average\_ weight}\times Num\_ of\_ Ctu\_ Height $$
(5)
$$ {QP}_{new}= QP-3\times {\log}_2\left({w}_{new}\right)\kern0.5em $$
(6)

3.2 Early termination of cu partition based on spatial correlation

There is a strong correlation between pixels of each frame in the video and between coding units of each CTU level. Therefore, the average depth of adjacent CTUs (upper left CTU, upper CTU, left CTU) can be used to predict the current CTU depth range based on spatial correlation. The flowchart is shown in Fig. 2. DR value corresponds to the depth interval according to Table 2.

Fig. 2
figure 2

CTU premature termination algorithm

Table 2 DR value and corresponding depth range

First, we classify the CTU partitions [8, 17]. As shown in Fig. 3, the value above each CTU partition [4] is the average depth of each CTU and the range to the left of each row includes all the depths of CTU partition in that row. When the average depth of the CTU is less than or equal to 1, the prediction depth range of CTU partition is [0, 1]. When the average depth of the CTU is equal to 1.25, 1.5 or 1.75, the prediction depth range of CTU partition is [1, 2]. When the average depth of the CTU is greater or equal to 1.3125 and less than or equal to 2.4375 that removes two depths 1.5 and 1.75, the prediction depth range of CTU partition is [1, 3]. When the average depth of the CTU is greater to 2.0625, the prediction depth range of CTU partition is [2, 3]. For example, if the average depth of the CTU is greater to 2.0625, the CTU partition may only has depth 2 and 3 except for depth 0 and 1.

Fig. 3
figure 3

types of CTU partition

According to the Table 2, the prediction depth range of CTU partitions are classified into four categories, these are T1, T2, T3 and T4 respectively. Each categorie includes all the depths in corresponding depth range. For example, if the CTU prediction depth category is belong to T2, the CTU partition has both depth 1 and 2. When the prediction depth range of the CTU is [0, 1], the prediction depth category is belong to T1. When the prediction depth range of the CTU is [1, 2], the prediction depth category is belong to T2. When the prediction depth range of the CTU is [1, 3], the prediction depth category is belong to T3. When the prediction depth range of the CTU is [2, 3], the prediction depth category is belong to T4. Among them, when prediction depth categories are T3 and T4, their prediction depth ranges are overlapping. For splitting accurately, we define the prediction depth category to be T3 when the average depth of CTU range is [2.0625, 2.4375] and determine it to be T4 when the average depth of CTU range is greater than 2.4375, as shown in formula (7).

$$ \left\{\begin{array}{l}\ \mathrm{Prediction}\ \mathrm{Depth}\ \mathrm{Range}\kern4em \mathrm{A} verage\kern0.5em Depth\kern0.5em \mathrm{of}\ \mathrm{CTU}\kern0.75em \\ {}\kern2em \mathrm{Depth}:0,1\kern9.25em \mathrm{Depth}<=1\kern0.5em \\ {}\kern2em \mathrm{Depth}:1,2\kern9.25em Depth=1.25/1.5/1.75\\ {}\kern2em \mathrm{Depth}:1,2,3\kern4em 1.3125<= Depth<=2.4375\left( Depth\ne 1.5/1.75\right)\ \\ {}\kern2em \mathrm{Depth}:2,3\kern9.25em Depth>2.4375\ \end{array}\right. $$
(7)

Taking into account the non-uniform sampling characteristics of ERP projection, the weights near pole areas are close to 0. Therefore we can appropriate to modify the judgment conditions to further decrease coding time. After the data analysis and experiment, it is found that in the projected region with weight w < 0. 4 (weight w is WS-PSNR’s weight), we can modify judgement condiction, as is shown in formula (8). That is to say, the category of CTU average depth range [2, 2.4375] is changed from T3 to T4. The purpose is to reduce the traversal of depth 1. Then the judgement condiction of depth = 2 is summarized in the prediction depth range [1, 2]. The purpose is to reduce the traversal of depth 3 but also introduces prediction errors.

$$ \left\{\begin{array}{l}\ \mathrm{Prediction}\ \mathrm{Depth}\ \mathrm{Range}\kern3em \mathrm{A} verage\kern0.5em Depth\kern0.5em \mathrm{of}\ \mathrm{CTU}\kern0.75em \\ {}\kern2.75em \mathrm{Depth}:0,1\kern7.75em \mathrm{Depth}<=1\kern0.5em \\ {}\kern2.75em \mathrm{Depth}:1,2\kern8em Depth=1.25/1.5/1.75/2\\ {}\kern2.75em \mathrm{Depth}:1,2,3\kern2.75em 1.3125<= Depth<2\left( Depth\ne 1.5/1.75/2\right)\ \\ {}\kern2.75em \mathrm{Depth}:2,3\kern7.75em Depth>2\kern0.5em \end{array}\right. $$
(8)

However, considering that pole areas have lower weight coefficient, which means these areas have little impact on quality on the whole under the assessment of WS-PSNR, and only the regions with weight w < 0. 4 of judgement condictions are changed. After the data analysis, we can ensure that this solution changes rate distortion cost little and further accelerates the process of CU partition.

In this paper, the early termination of CU partition algorithm is as follow: According to Fig. 3 and Table 2, the three adjacent CTUs are classified based on the average depth of CTU to obtain the prediction depth range. Then we obtain the corresponding depth categorie based on the prediction depth range and achieve the corresponding DR value according to Table 2. Next three DR values are put into the list and arranged in an ascending order. We take the middle DR value of the list and get its corresponding prediction depth range. Then we set this prediction depth range as the current CU prediction depth range.

But there is a problem with this algorithm: it may be misjudged because of the narrow interval like [0, 1], [1, 2] and [2, 3]. For example, after obtaining the depth categorie of adjacent CTUs, we get the DR values according to the Table 2 and put them in a list in an ascending order. The list is obtained as [0, 0, 2]. In this case, the middle DR value of the list is 0 and corresponds to category T1([0, 1]) according to Table 2. However, after full traversing the current CTU, the DR categoty is actually T2([1, 2]). This is a misjudgment caused by the narrow interval. Therefore, the algorithm needs to add two DR values (1 and 5) additional, as shown in Table 2. At this moment, the list is expanded to be [0, 0, 1, 2, 5] and we take the middle DR value 1, which means the current CTU prediction depth range is [0, 2] according to Table 2. It is easy to see that DR 1([0, 2]) covers the DR 0([1, 2]), which reduces the misjudgemnet. When three values in the list are 0, it indicates that the current CTU is mostly in a smooth area, therefore categories T1 is highly likely. Similarly, adding DR = 5 is to cope with region with higher complexity and reduces the misjudgment of category T4. Therefore, this paper initializes the list to [1, 5] and modifies it by obtaining the prediction depth range categories of adjacent CTUs and then take the middle DR value as the current CTU prediction depth categorie.

3.3 Fast intra prediction mode decision

The intra prediction angular mode [19, 31] is strongly related to the pixel changes in one direction of the current coding unit. It is possible to know the approximate range of the angular prediction in advance if we get the direction in which the pixel variation of the encoding unit is minimized. By this way, we can predict the range of the angular mode roughly, which reduces the encoding complexity. Because it is not only skipping the rough selection process directly, but also greatly reducing the number of intra prediction modes involved in the RDO calculation process. This paper proposes to use the prewitt operator to extract the pixel change direction in a 4 × 4 size coding unit. Figure 4 shows the prewitt operator template:

Fig. 4
figure 4

prewitt operator template

prewitt operator is a 3 × 3 pixel matrix, the center point is f (x, y). First, the template operator is used to calculate the gradient of the corresponding direction in four directions, as shown in (9) ~ (12). Gx and Gy are obtained by combining the gradient values in four directions as shown in Eqs. (13) to (14). After obtaining Gx and Gy and taking their average, the maximum direction of pixel variation can be obtained using Eq. (15), and finally the predicted angular mode is obtained by (16) and (17).

$$ {G}_H=f\left(x-1,y+1\right)+f\left(x,y+1\right)+f\left(x+1,y+1\right)-f\left(x-1,y-1\right)-f\left(x,y-1\right)-f\left(x+1,y-1\right) $$
(9)
$$ {G}_V=f\left(x+1,y-1\right)+f\left(x+1,y\right)+f\left(x+1,y+1\right)-f\left(x-1,y-1\right)-f\left(x-1,y\right)-f\left(x-1,y+1\right) $$
(10)
$$ {G}_{45}=f\left(x-1,y\right)+f\left(x-1,y+1\right)+f\left(x,y+1\right)-f\left(x,y-1\right)-f\left(x+1,y-1\right)-f\left(x+1,y\right) $$
(11)
$$ {G}_{135}=f\left(x-1,y-1\right)+f\left(x,y-1\right)+f\left(x-1,y\right)-f\left(x+1,y+1\right)-f\left(x,y+1\right)-f\left(x+1,y\right) $$
(12)
$$ {G}_x={G}_v+\frac{G_{45}}{2}+\frac{G_{135}}{2} $$
(13)
$$ {G}_y={G}_H+\frac{G_{45}}{2}-\frac{G_{135}}{2} $$
(14)
$$ \theta =\arctan \left(\frac{\overline{G_y}}{\overline{G_x}}\right),\theta \in \left(\hbox{-} 0.5\pi, 0.5\pi \right) $$
(15)
$$ {\theta}^{\mathrm{new}}=\left\langle \begin{array}{l}\theta +0.5\pi \kern0.5em \theta \in \left(\hbox{-} 0.25\pi, 0.5\pi \right)\\ {}\theta +1.5\pi \kern0.5em \theta \in \left(\hbox{-} 0.5\pi, \hbox{-} 0.25\pi \right)\end{array}\right. $$
(16)
$$ \operatorname{mod}e=-32\times \frac{\theta^{new}}{\pi }+42 $$
(17)
$$ \mid {\theta}_i-{\theta}_j\mid <0.4,\kern0.5em i,j=1,2,3,4\kern0.5em \mathrm{i}\ne \mathrm{j} $$
(18)

For a 4 × 4 size PU block, it contains four 3 × 3 pixel matrices. The paper sets angles of pixel matrices in four directions to θ1, θ2, θ3and θ4 respectively. Then texture correlation of four blocks are used to judge the significance of the texture information in the current PU. θ1, θ2, θ3andθ4 have six different combinations. It shows that the current two sub blocks have large texture correlation [10] if the following formula (18) holds. In the meantime, if the formula (18) holds, then N(N is initialized to 0) increases by 1. The texture correlation of the current PU is not critical if the number of N is less than 3, which means that the texture information of the current PU block is in disorder and the approximate prediction mode based on texture information is probably inaccurate. Thus, in the case that the texture direction is not significant, the prediction modes with the lowest cost value selected by RMD will be set to candidate prediction modes. If the number of the set N is over 2, the texture direction of the current PU is significant and we calculate the prediction angular modes by formula (15, 16, 17).

For a 8 × 8 size PU block, it has 36 pixel matrixs and requires a large amount of computation. When PU size is 16 × 16, 32 × 32, 64 × 64, the computational complexity of pixel matrixs is ​​even huge, therefore this paper doesn’t continue to use prewitt operator. Through many statistical analysis and tests, we find it can reduce complexity by reducing unnecessary prediction modes. This paper first analyzes the best mode and the cost calculated by the rough selection process [5], then proposes an efficient mode decision algorithm. This paper performs RMD in two steps to reduce computational complexity. During the first RMD process, 35 original modes are reduced to 11 modes, these are 0, 1, 2, 6, 10, 14, 18, 22, 26, 30 and 34 respectively. The details are shown in Fig. 5. After the first RMD, distinguishing the first mode(FM is short for the first mode) in the list that obtained after the first RMD and then putting modes adjacent to FM like FM-2, FM-1, FM + 1, FM + 2 and modes 0, 1, 2, 6, 10, 14, 18, 22, 26, 30, 34 into a new list for performing the second RMD. After these two phases, getting the first two modes in the final list obtained after the second RMD process for Rate Distortion Operation (RDO). Compared to the original HEVC, it reduces the unnecessary candidate modes in RDO [26]. The following is the detailed algorithm.

Fig. 5
figure 5

11 modes for the first RMD

First, we simplify 33 angular modes into 9 angular modes [25] with equal intervals, as shown by the red lines in Fig. 5. The first RMD traverses 11 modes, these are nine angular modes, DC and PLANAR respectively. The list that obtained after the first RMD process is defined as MyMode. Second, we distinguish whether the FM in MyMode is mode 0 or 1. If the FM is mode 0 or 1, the PU best mode is likely to be mode 0 or 1; if the FM is an angular mode, it can be speculated that the PU is very likely to be a mode that adjacent to the FM. Then we add these modes that adjacent to the FM with 11 modes that belongs to the first RMD in a new list for performing the second RMD.

For different sizes of PU, we take two different algorithms to get the best prediction mode. (1) PU size is 16 × 16, 32 × 32, 64 × 64: This paper first distinguishes the FM in MyMode, if the FM is a DC, a PLANAR or a VM(VM is short for mode 26), we retain three modes in MyMode. If the FM is an angular mode, the four modes that adjacent to the FM (FM-1, FM-2, FM + 1, FM + 2) are added to a list with 11 modes for performing the second RMD. It takes the first two modes to be best prediction modes in the final list that obtained after the second RMD. Comparing with the original HEVC framework, it decreases one prediction mode. For example, after the first RMD, if the FM in MyMode is angular mode 6, the four modes adjacent to 6 are 4, 5, 7 and 8. Then these are added to a list with 11 modes for performing the second RMD. At last, we take the first two modes in the final list obtained after the second RMD process as prediction modes of the PU; (2) PU size is 8 × 8: after the first RMD, determining FM and SM(SM is short for the second mode) in MyMode whether both are DC or PLANAR. In this case, this algorithm will choose the first two of MyMode as the best prediction modes; if FM or SM in MyMode is an angular mode, adding four modes adjacent to the angular mode(FM or SM)in a list with 11 modes for the second RMD. Finally, getting the final list obtained after the second RMD and taking the first two modes as the prediction modes. By the way, when angular mode numbers are 2 or 34, these are only two modes adjacent them.

4 Algorithm process

  1. (1)

    Calculate the newly defined wnew and obtain newly QPnew according to the formula (18). Then we set it as the current Slice parameter;

  2. (2)

    Analyzing categories of CTU partition, we get the prediction depth range based on the average CTU depth. Then we obtain the DR value according to Table 2. Next, we put these DR values in a list and arrange the list in an ascending order. At last, we take the middle DR value of the list to predict the current CTU depth range based on spatial correlation. At the same time, considering the characteristic of ERP projection, we expand the judgement conditions for pole areas with weight w < 0. 4,which can further decrease the encoding time;

  3. (3)

    In intra prediction mode, we use prewitt operator to analyze 4 × 4 PU texture. If texture correlation is strong enough, we can directly predict the angular mode. At the same time, we cope with 8 × 8 and 16 × 16, 32 × 32, 64 × 64 PU respectively. The first RMD traverses only 11 modes with equal space and obtains the FM in MyMode. During the second RMD, getting four modes that are adjacent to the FM, then we traverse them with 11 modes(in the first RMD) for the second RMD and get the final list at last. We take the first two of the final list as prediction modes.

Figure 6 shows the architecture of proposed algorithm:

Fig. 6
figure 6

Architecture of proposed algorithm

figure e

5 The experiment result analysis

In order to verify the feasibility of the proposed compression algorithm, we integrate the algorithm to test the rate distortion performance and coding complexity in HM-16.16-360Lib-4.0. Experimental platform hardware is set to Intel Core i7–7700 CPU @ 3. 60GHz and RAM is 8. 0GB. The coding parameter of the experiment is All Intra Main10 (AI-Main10). The number of coding frames is 120 and the QP are 22, 27, 32, and 37 respectively. In order to measure the rate distortion performance of the algorithm, BD-rate is used to represent the bit rate variation under the same image quality condition. The symbol ΔT is used to measure time saving and the WS-PSNR is used to measure the image quality change. Time reduction is calculated by the following formula(19), where THM16.16 is the coding time of HM16.16, Tproposed is the coding time of the proposed algorithm, and ΔT is the time reduction. The decrease in WSPSNR-Y is calculated using formula(20).

$$ \kern1em \Delta T=\frac{T_{HM16.16}-{T}_{proposed}}{{\mathrm{T}}_{HM16.16}}\times 100\% $$
(19)
$$ \Delta {\mathrm{WSPSNR}}_Y={WSPSNR}_{HM16.16}-{WSPSNR}_{proposed} $$
(20)

In this paper, twelve standard test sequences proposed by the proposals of JVET-D0026 [1], JVET-D0039 [3], JVET-D0053 [7], JVET-G0147, JVET-D0143 and JVET-D0179. Prior to encoding, test sequences are converted to a low-resolution ERP for encoding (for accuracy quality assessment). For 8 K and 6 K ERP video, the encoding size is set to 4096 × 2048, and for 4 K ERP video, the encoding size is set to 3328 × 1664.

In general, the different test sequences achieve different time-saving percentages mainly because of different texture and contents. The experimental results in Table 3 show that the proposed algorithm reduces the average BD-rate by 0.3%, decreases coding time by 42.4% and improves the WS-PSNR by 0.03 compared with HM16.16. This is because the proposed algorithm uses spatial correlation to perform the CU early partition based on the prediction of depth range. By using the texture features of PU and reducing unnecessary prediction modes, the proposed algorithm reduces the coding complexity. In adaptive QP compensation, QP parameters are adaptively compensated based on the coordinate of pixel in ERP projection that can make encoding more efficiently and reasonably. This proposed algorithm reduces more time in the sequence of ChairliftRide, Balboa and SkateboardlnLot mainly because the textures of these background are relatively simple that may skip coding unnecessary smaller size CU; but less time reduction for sequences such as KiteFlite and BranCastle2, because the textures of these test sequences are more complexity and mostly likely to be partitioned into smaller CUs, which reveals that their corresponding depths are larger than other sequences, and hence, the premature termination algorithm has a small effect for them on decreasing encoding time. By analyzing the uneven sampling characteristic of ERP projection, the weight is very small at pole areas, which means the distortion impact of these areas is very little on the whole under the assessment of WS-PSNR. Therefore we can expand the judgment conditions for these areas to further decrease the encoding time. Figure 7 compares the rate-distortion performance of HM16.16 with proposed alogrithm in sequences like AerialCity, BranCastle2, ChairliftRide and SkateboardInLot. Figure 8 shows the differences between the proposed algorithm and HM16.16 in CU partition. Table 4 shows the accuracy between the proposed algorithm and HM16.16 in premature termination of CU partition.

Table 3 Experimental data
Fig. 7
figure 7

the comparison between this paper and HM16.16 in rate distortion performance

Fig. 8
figure 8

the comparison between the proposed algorithm and HM16.16 in CU partition

Table 4 the accuracy of CTU classification in this paper

6 Conclusion

This paper presents an adaptive QP compensation, CU early partition and adaptive mode selection algorithm for reducing computational complexity of the 360 video coding. The proposed algorithm is performed on HEVC reference software HM16.16. The QP compensation is adaptive modified according to the coordinate of pixel in ERP projection.The CU early partition is based on the prediction of depth range and spatial correlation. The adaptive mode selection decreases the number of modes in RMD process and if the texture correlation of block is strong, it can directly predict the mode. The experimental results show that the proposed algorithm reduces the average BD-rate by 0.3%, decreases 42.4% of the average coding time and improves the WS-PSNR by 0.03 compared with HM16.16. This proves that the algorithm has strong practical value.