1 Introduction

An intelligent user interface for interactive multimedia is an emerging field of study with the rapid development of hardware devices such as multidimensional sensors and motion detector [1]. The interactive multimedia means the integration of digital media including combinations of electronic text, graphics, moving images, and sound into a structured digital computerized environment that allows people to interact with the data in internet, telecoms and interactive digital television. There are many basic techniques and services such as image processing, pattern recognition, computer vision, storage/transmission for multimedia toward intelligent user interface, semantic web and smart grid services. This paper focuses on video coding technique to help the improvement of the performance in various multimedia services for interactive 3D video processing and cloud computing [2, 3, 4].

A new generation video coding standard, called a high-efficiency video coding (HEVC) [5], has been developed by the Joint Collaborative Team on Video Coding (JCT-VC) group. The JCT-VC is a group of video coding experts created by ITU-T Study Group 16 (VCEG) and ISO/IEC JTC 1/SC 29/WG 11 (MPEG) in 2010.

Although coding efficiency improvement is solved it has more computational complexity than the H.264/AVC [6] because of tools having high complexity and high resolution of sequences. Therefore, reduction of encoding time with convincing loss is an interesting issue.

The video encoding process in the HEVC has three units for block structure: (1) a coding unit (CU) is basic block unit like macroblock in the H.264/AVC, (2) a prediction unit (PU) for performing motion estimation, rate–distortion optimization and mode decision, (3) a transform unit (TU) for transform and entropy coding. Encoding process encodes from coding tree block (CTB) having largest block size to its 4 child CUs, recursively.

An early partition decision algorithm [7] has been presented that attempts to termination mode process after checking the inter mode with each of the PU partition modes. Kim et al. [8] proposed an early termination method based on average rate–distortion cost of the previous skipped CUs. Some fast algorithms have been reported that explore other components [9, 10]. To determine the CU size before the general RD optimization technique, the supported vector machine (SVM)-based CU splitting early termination algorithm is proposed [11]. Early determination algorithms have been proposed that use SKIP conditions from fast method decision schemes [1214]. In [15], CU mode decision and TU size selection were reported using variance values of coding mode costs. A fast intraprediction approach based on gradient has been used in [16]. In [17], residual quadtree mode decision algorithm considering the rate–distortion efficiency has been reported.

A two-stage process approach is proposed in [18]. Wu et al. [19] reported method using spatial homogeneity and the temporal stationarity characteristics of video objects [19]. A direct prediction method based on interblock correlation and adaptive rate–distortion cost threshold to stop early was also proposed by Kim et al. [20]. In [21], a tree-pruning algorithm was proposed to make an early termination CU. Shen et al. [22] proposed an early CU size determination algorithm. Fast interprediction method to omit unnecessary process was proposed using probability of the best inter mode of nNxnN PU [23]. If a inter 2 Nx2 N is skip mode as the best, advanced motion vector prediction (AMVP) mode is skipped. Ahn et al. [24] proposed a fast decision algorithm of CU partitioning based on sample adaptive offset (SAO) parameter values, PU and TU sizes, MV sizes, and coded block flag (cbf) information from the reference blocks and others to reduce the complexity of CU split decision.

In this paper, we propose an early CU and PU mode decision algorithm for reducing the computational complexity of mode decision process about CU and PU mode in the HEVC encoder. In a CTB, CU partitioning is performed. Firstly, based on the local statistic of the rate–distortion (RD) costs and motion vectors, an early CU splitting termination (CST) is proposed to decide whether a CU should be decomposed into 4 lower dimension CUs or not. Then, block motion complexity that considered spatio-temporal PU information is defined to reduce early by fast PU decision.

This paper is organized as follows: In Chapter 2, a fast CU splitting termination (CST) and fast PU decision method are described for proposed algorithm. Chapter 3 presents the coding performance of the suggested algorithm combined by two methods. In Chapter 4, concluding remarks will be given.

2 Proposed work

2.1 Fast coding unit splitting termination (CST) method

The main objective is to select the lowest RD cost before all combinations are finished. We have designed the ratio function of \({\text{CU}}_{d} (i)\) at depth ‘d’ that can be defined as:

$$r_{i} = \frac{{{\text{RD}}_{\cos t} ({\text{CU}}_{d} \left( i \right))}}{{{\text{RD}}_{\cos t} ({\text{CU}}_{d - 1} )}}{\text{\,where}} \mathop \sum \limits_{i = 0}^{3} r_{i} \le 1.$$
(1)

When a CU is split, it will be divided up into 4 child CUs. Therefore, we can define a value of ratio function for each newly created CU. This parameter \(r_{i}\) denotes the defined ratio of the RD costs of the current CU and its parent CU. We have performed an analysis on the original HM reference software for 4 sequences with different dimensions. We have checked values of the ratio function for all encoded CUs for 4 QP values (i.e., 22, 27, 32, 37) which have been split in our experiment. As shown in Fig. 1, all the split CUs have smaller values of the ratio function than 0.25.

Fig. 1
figure 1

Ratio function analysis for 4 sequences between Class A to D

Therefore, when the ratio function of a child is lower than its siblings, it has low amount of chance to split in the next depth level. Accordingly, this parameter can be used as a threshold for making the decision to split a current CU to next depth level.

After checking ratio function, we have incorporated different motion activities in a video sequence. In previous algorithm [21], if a CU is coded as SKIP mode then no further splitting of a CU. Motivating from this fact, we have explored other prediction modes of the PU. INTRA and INTER predictions have different kinds of PU types. INTER modes are 2 Nx2 N, 2 NxN, Nx2 N, NxN (presently, this mode is not used), 2 NxnU, 2 NxnD, nLx2 N and nRx2 N. INTRA modes are 2 Nx2 N and NxN. SKIP mode consisted of 2 Nx2 N size.

A CU is divided into 4 CUs of lower dimensions after the completion of all mode calculations. PU modes are processed recursively from the CTB to all children CUs. Table 1 has classified all the prediction modes according to motion activities. This kind of motion activity was explored by a good approach in the H.264/AVC codec [25, 27].

Table 1 Classified weight factors for all prediction modes according to the motion activities

We have assumed that if a region in a video sequence have relatively slowly motion activity compared to the other regions, then there is a high chance that the slowly moving region will not split in that hierarchy of the CTB. From this concept, we have performed an experiment to show the percentage of non-splitting of a CU in any level of hierarchy of CTB. We have tested 4 sequences for different resolutions for 4 QP values in 20 frames and observed the CUs which are not split for different PU partition mode (PU_WF).

Table 2 shows result of this experiment. From this table, it is clear that when PU_WF is 0 (SKIP mode), there is a very high chance that the CU will not split in the next level. Hence, if the PU_WF is 0, then we can directly decide that there is no need to split the CU. Also, if PU_WF is 1 or 2, there is a quite good amount of chance that the CUs are still not split in the next level.

Table 2 Results of the percentage of CUs which are not split in the next level

To investigate whether PU_WF is 1 or 2, we have calculated a local average of the RD cost values of all encoded CUs. This parameter is not a static value as it takes the values dynamically from the encoded CUs. In this process, the final PU mode and the corresponding RD cost values are checked after encoding CU. The Eqs. (2) and (3) are calculated after encoding of a CU check dimension (d) of CU and PU_WF (\({\text{wf}}\)) under satisfying \({\text{wf}}\,{ < }\, 3\).

$${\text{sum}}_{d}^{\text{wf}} = \left( {{\text{avg}}_{d}^{\text{wf}} \times {\text{count}}_{d}^{\text{wf}} } \right) + {\text{RD}}\cos t_{d}^{\text{wf}} ,$$
(2)
$${\text{avg}}_{d}^{\text{wf}} = \frac{{{\text{sum}}_{d}^{\text{wf}} }}{{{\text{count}}_{d}^{\text{wf}} + 1}}$$
(3)

We have performed a test to calculate the CU non-splitting probability when the RD cost of the CU is less than its local average RD cost. Table 3 shows the tested result using 4 sequences having same environment. As shown in Table 3, it is clear that for PU_WF 0, 1 and 2, the probabilities are almost within 65–75 %. It means that the proposed model can be effective when encoding the current CU.Also, an absolute motion vector (MV) is also calculated in proposed method. An average of the motion vectors can be achieved from both List_0 and List_1 in the HM reference software. We only use magnitudes of the MV directions for the x and y to calculate simply as shown in Eq. 4.

Table 3 Results of the percentage of CUs which are not split in the next level if their \({\text{RD cos}}\,t < {\text{avg}}_{d}^{\text{wf}}\)
$${\text{MV}}_{\text{abs}} = 0.5 \times \left( {\mathop \sum \limits_{i = 0}^{1} \left| {{\text{MV}}_{x} \left( i \right)} \right| + \left| {{\text{MV}}_{y} \left( i \right)} \right|} \right) .$$
(4)

To make use of the above equation, we have assume that the CU which has low motion should have a high chance of non-splitting. Since \({\text{MV}}_{\text{abs}}\) is directly related with the motion of CU, we can infer that the CU which has low \({\text{MV}}_{\text{abs}}\) value has a high chance that it will not split in the current hierarchy of CTB. We have checked the percentage of CUs which are not split for different values of \({\text{MV}}_{\text{abs}} .\) The experimental result is shown in Table 4. We can observe that for \({\text{MV}}_{\text{abs}} = 0\), there is a quite high chance that the corresponding CU will not split.Using the Eq. (4), we can know the motion of CU with direct relationship with \({\text{MV}}_{\text{abs}}\). The CU which has low \({\text{MV}}_{\text{abs}}\) value has high change that will not be divided in the current hierarchy of CU. Since the probability values of \({\text{avg}}_{d}^{\text{wf}}\) and \({\text{MV}}_{\text{abs}}\) are not so high (65–75 %), we cannot take any straightforward decision. Therefore, we need to combine this with other features in our main algorithm.

Table 4 Results of the percentage of CUs which are not split in the next level for different \({\text{MV}}_{\text{abs}}\) values

According to the above-mentioned method, the proposed algorithm can determine which CU needs to split early. The CU decision part in our algorithm can be divided into two stages. The first stage is to consider the CTU case. In terms of the CTU, we do not have any information from higher levels. Hence, ratio function cannot be calculated. In the second stage, other higher level CUs are considered. Figure 2 shows the flowchart of CU splitting termination (CST) algorithm. This proposed CST method checked the PU mode weighting factor initially. If PU_WF is 0 meaning SKIP mode, the decision is taken directly that there is no need of splitting. If PU_WF is 1 or 2, then we cannot make any decision directly. Hence, these methods are divided into two stages to use different checking. For the CTU case, this algorithm only use \({\text{MV}}_{\text{abs}}\) and local average RD cost of equivalent dimension of PU for PU_WF = 1 and 2.  If the RD cost of the current PU is less than average RD cost in same dimension and \({\text{MV}}_{\text{abs}}\) is 0, encoder is selected such that there is no need of splitting in the CTU case for PU_WF = 1 and 2.Otherwise, ratio function, \({\text{MV}}_{\text{abs}}\), and local average RD cost of suitable dimension of PU for PU_WF = 1 and 2 are used in non-CTU case. A non-CTU case is of same process. Since it can use the ratio function with higher level’s information, if the RD cost is less than average RD cost of suitable dimension of PU and ratio function (R) is less than 0.25, it can also decide that CU is no need to split.

Fig. 2
figure 2

Flowchart of local average of RD cost calculation of CST algorithm

2.2 Fast prediction unit (FPU) decision technique

In this subsection, we propose an effective prediction unit selection method based on correlation and block motion complexity (BMC) that can be applied together with the CST above proposed. A natural video sequence has high spatial and temporal correlations. To analysis these correlations, we define predictor set (Ω) such as \({\text{PU}}_{ 1}\) to \({\text{PU}}_{ 7}\). Figure 3 shows position from current CU for each \({\text{PU}}_{i}\).

Fig. 3
figure 3

The position of adjacent PUs from current CU for (a) spatio-temporal and (b) depth correlation

\({\text{PU}}_{i}\) is set as the mode of \({\text{PU}}_{ 1}\) to \({\text{PU}}_{ 7}\) from PU of current CU. \({\text{PU}}_{ 1}\), \({\text{PU}}_{ 2}\), \({\text{PU}}_{ 3}\), \({\text{PU}}_{ 6}\), and \({\text{PU}}_{ 7}\) are spatial PUs. \({\text{PU}}_{ 4}\) and \({\text{PU}}_{ 5}\) are temporal PUs. \({\text{PU}}_{ 1}\), \({\text{PU}}_{ 2}\), and \({\text{PU}}_{ 3}\) are PUs at above-left, above, and left PU from current CU, respectively. \({\text{PU}}_{ 4}\) is the co-located PU at forward direction and \({\text{PU}}_{ 5}\) is the co-located PU at backward direction. \({\text{PU}}_{ 6}\) and \({\text{PU}}_{ 7}\) are co-located parent PUs in depth-1 and depth-2, respectively.

We have explored correlation for predictor (\({\text{PU}}_{i}\)) in environment of 50 frames, 4 sequences of Class D, and random access with main profile configuration. Table 5 shows the probability of \(P({\text{Mode}}_{\text{current}} = {\text{PU}}_{i} |{\text{PU}}_{i} , i = 1,2,3, \ldots ,7)\). It is clear that temporal and depth correlation values are relatively small, while correlation of \({\text{PU}}_{ 1} , {\text{PU}}_{ 2} , {\text{PU}}_{ 3}\) is very large.

Table 5 The probability of spatial, temporal, and depth correlation for adjacent PUs from current CU (\({\text{PU}}_{ 1}\) to \({\text{PU}}_{ 7}\) are located in Fig. 3)

To more investigate for temporal correlation for \({\text{PU}}_{ 4}\) and \({\text{PU}}_{ 5}\), we have considered structure of temporal distance of the reference picture. Figure 4 indicates the structure of random access configuration in case of basic size of group of picture (GOP) is 8. In this structure, frame has distance with reference frame according to reference position and it can be represented as temporal \({\text{Level}}_{i}\). Temporal \({\text{Level}}_{ 1}\) is equal to 8 from the reference frame. The distance of \({\text{Level}}_{ 2} , {\text{Level}}_{ 3} ,\) and \({\text{Level}}_{ 4}\) are 4, 2, and 1 according to temporal decomposition, respectively.

Fig. 4
figure 4

Coding order and level of temporal layer for each of the frames in random access configuration (size of the GOP = 8)

Table 6 shows the probability that the mode of the current CU is the same with the mode of the co-located CU in forward or backward reference frame. This result has considered temporal distance of the reference frame from the current frame. The temporal distance is represented by letter \(d\). If temporal distance is smaller, then it has larger probability that a PU is encoded with the same mode with mode of the reference PU.

Table 6 The probability of correlation of the predicted mode according to the temporal distance

From the defined and observed factors, we can deduce a new block motion complexity model. To define block motion complexity, we have designed weight factor according to location from current CU as shown in Table 7. Table 7 indicates to assign more weight in spatial correlation and upper level (depth ‘d – 1’) from current depth ‘d’ than other position. Based on the designed mode weight factor (Table 1) for motion activities, we defined a block motion complexity (BMC) [28] according to the predefined mode weight factors (γ) and a group of predictors in Ω to estimate the PU mode of the current CU.

Table 7 Weight factor for PUs (default setting and changing weight factor adaptively according to level of temporal layer)

The BMC has been represented as follows: [28]:

$${\text{BMC}} = \frac{{\mathop \sum \nolimits_{i = 1}^{N} W\left( {i,l} \right) \cdot k_{i} \cdot r_{i} }}{{\mathop \sum \nolimits_{i = 1}^{N} W\left( {i,l} \right) \cdot k_{i} }},$$
(5)
$$W\left( {i,l} \right) = w_{i} + f\left( {i,l} \right),$$
(6)
$$f\left( {i,l} \right) = {\text{round}}\left( {l \cdot T_{\text{level}} } \right) \cdot T_{t} \cdot Tk_{i}$$
(7)

where \(N\) is the number of PUs equal to 7, \(W\left( {i,l} \right)\) is function of the weight factor for each mode. \(\gamma_{i}\) is a weight factor for mode of \({\text{PU}}_{i}\) in Ω. The \(k_{i}\) denotes the available PUs. If the current CTB is in boundary region in the frame, then above or left PU information is not available. When \({\text{PU}}_{i}\) is available, \(k_{i}\) is set to 1; otherwise, \(k_{i}\) is equal to 0.

Equation (6) is an adaptive weighting factor function. The value of \({\text{w}}_{i}\) is weight factor for \({\text{PU}}_{i}\). It has property of \(\mathop \sum \nolimits_{i = 1}^{N} w_{i} = 1\). A value of \(l\) denotes the temporal decomposition layer. The HEVC consists of three layers in 8 GOP according to distance between frames. Accordingly, temporal layer can represent \({\text{Level}}_{ 1 , 2 , 3 , 4}\) having 8, 4, 2, and 1 of distance. If the current frame is close with reference frame, it has strong correlation than long-distance case. Therefore, setting weight factors is changed when PU has 1 and 2 distances adaptively as second row in Table 7. \(T_{\text{level}}\) is set to 0.02 and is used for normalization of temporal level as \(l\). \(T_{k}\) is a control value for changing the weight factors. \({\text{Tk}}_{i}\) is a control value used to increase or decrease weight factors. It is set to 1, −2, or 0. PUs having the temporal correlations with \({\text{PU}}_{ 4}\) and \({\text{PU}}_{ 5}\) are set to be 1. Also, for a \({\text{PU}}_{ 6}\), it is set to be −2. The others are set to 0.

$$\left\{ {\begin{array}{*{20}c} \!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!{{\text{BMC}} < {\text{Th}}_{1} ,} & {\text{Slow motion,}} \\ {{\text{Th}}_{1} \le {\text{BMC}} < Th_{2} ,} & {\text{Medium motion,}} \\ \!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!{{\text{Th}}_{2} \le {\text{BMC}},} & \qquad{\text{Complex motion or texture,}} \\ \end{array} } \right.$$
(8)

where \({\text{Th}}_{1}\) and \({\text{Th}}_{2}\) are set to 1 and 3 as the mode weight factor for slow motion and complex motion or texture in PU_WF in Table 1.

Figure 5 shows the flowchart for fast prediction unit decision algorithm. Before each PU is calculated, BMC is calculated and compared with each threshold. If the motion is slow, the proposed method performs SKIP mode and Inter 2 Nx2N. In case of medium motion, mode decision process performs modes of slow motion, Inter 2NxN and Nx2N. When motion is complex, all modes are performed, then mode search process selects the best one.

Fig. 5
figure 5

Flowchart for fast prediction unit decision algorithm

3 Experimental results

The proposed algorithm was implemented on the original HM 10.0 (HEVC reference software). Test conditions were random accessed using RA-Main. Standard sequences with 50 frames were used from two sequences per each class with various QP values (22, 27, 32, 37). Video sequences are classified as Class A to D sequences. Each class has different resolutions: Class A sequences—2560 × 1600 pixels, Class B sequences—1920 × 1080 pixels, Class C sequences—832 × 480 pixels, Class D sequences—416 × 240 pixels. Details of the encoding environment can be seen in JCTVC-L1100 [29].

To evaluate the performance, we defined the measurement of ΔBit, \(\Delta {\text{PSNR}}_{Y}\), and ΔTime as:

$$\Delta {\text{Bit}} = {\text{Bit}}_{\text{proposed}} - {\text{Bit}}_{\text{original}}$$
(9)
$$\Delta {\text{PSNR}}_{{Y = {\text{PSNR}}_{{Y_{\text{proposed}} }} }} - {\text{PSNR}}_{{Y_{\text{original}} }}$$
(10)
$$\Delta {\text{Time}} = \frac{{{\text{Time}}_{\text{Proposed}} - {\text{Time}}_{\text{Anchor}} }}{{{\text{Time}}_{\text{Anchor}} }} \times 100$$
(11)

\(\Delta {\text{Time}}\) is a complexity comparison factor used to indicate the amount of total encoding time saving (Eq. (11)). From Eq. (11), \({\text{Time}}_{(x)}\) means the total consumed time of the method \(x\) for encoding.

Table 8 shows results for comparisons between the original HM 10.0 software and the proposed algorithm without any fast options. The Bjøntegaard Delta bitrates (BDBR) are shown in Table 8 [30]. Moreover, we have considered three other algorithms (namely CST, FPU and MVM) which are shown in Table 8. From Table 8, it can be observed that CST reduces encoding time from 9.05 to 27.57 %, 18.41 % on average with 1.44 % of the BDBR from 0.29 % and 0.04 dB losses both in bit rate and Y-PSNR, respectively. The FPU has achieved the computational complexity reduction from 24.51 to 56.40 %, 41.51 % on average. Meanwhile, the BDBR between FPU and HM 10.0 is 1.65 %, and bit rate and Y-PSNR losses are 0.63 % and 0.04 dB. Performance factor of time saving between MVM and HM 10.0 is 19.18 to 25.81 %, 22.98 % on average with 0.64 % of bit increment and 0.03 dB of quality loss. The BDBR of MVM is 1.35 % on average.

Table 8 The performance of the original HM reference software 10.0 and the proposed algorithm

The proposed method reduces the computational complexity from 32.03 to 61.01 %, 51.72 % on average with some loss as 1.09 %, and 0.05 (dB) losses both in bit rate and Y-PSNR, respectively. The BDBR between proposed algorithm and HM 10.0 is 2.65 % on average. It can be noted from the table that the proposed algorithm gives superior result in terms of time reduction compared with other fast algorithms. However, it suffers from marginal loss of 2.65 % in BDBR including bit rate and PSNR losses. But if we consider the time reduction, then the loss is negligible.

The proposed algorithm increases the speed of the HEVC encoding system up to 61.01 % on BQTerrace sequence in Class B sequence, compared to the full mode search. Compared to the MVM method, the proposed algorithm achieved speed-up gain of up to 32.37 % with a smaller bit increment for Class B sequences. Using the proposed algorithm, we can see that the average speed-up gain of over 52 % was obtained comparing to the full mode search while keeping negligible bit rate increment. For Class A sequences, the average speed-up factor of 46.35 % was achieved by the proposed method with 1.07 % of bit increment and 2.05 % BD rate loss. When comparing to the MVM method, about 29 % of speed-up factor has been obtained.

In terms of bit rate, the proposed method achieved small increment of bit rate (0.32 %) in Class B sequences as minimum bit increment. For Class D sequences, bit increment of up to 1.78 % was obtained. The proposed algorithm has superior result in terms of time reduction compared with other fast algorithms. Although, the proposed method has little more bit rate than other fast algorithms, the computational complexity was reduced almost 33, 28.74, and 10 % than the CST, MVM, and FPU, respectively. This means that the proposed algorithm can provide credible quality for larger video resolution, with high speed-up factor. From these results, the proposed algorithm is efficient to make real-time encoding system without significant degradation of the encoding performance.

Figures 6 and 7 show the RD performance of one of test sequences in Class A to D. In Fig. 6(a) and (b), the proposed method is very similar to the original HM 10.0 software. There is negligible loss of bit rate. However, it is shown that for the bigger QP values, larger loss are generated for bit rate in Fig. 7 (a) and (b). Although our algorithm has small loss for bit rate and PSNR, it has significant time-saving factor of 32.03 % at least and up to 61.01 %. In performance of Class B, the proposed algorithm can achieve very little loss as 0.37 % of bit rate and 0.04 (dB) of PSNR, and 52.57 % of time-saving performance without any fast options.

Fig. 6
figure 6

Rate–distortion (RD) curves for (a) Traffic (Class A) and (b) BQTerrace (Class B) sequences in random access, main profile condition

Fig. 7
figure 7

Rate–distortion (RD) curves for (a) BasketballDrill (Class C) and (b) BlowingBubbles (Class D) sequences in random access, main profile condition

For the BQTerrace (Class B) sequence, there is negligible loss of quality. The proposed method gives almost similar performance to the original HM encoder. In the overall bit range, we can see that the proposed algorithm gives almost similar quality with same bit rate. This means that the suggested algorithm can keep a reliable video quality with speeding up the HM encoder by about 53 %. Up to 20 Mbps, we can see that the performance of the proposed algorithm is credible in terms of bit rate and image quality. From 25 Mbps, the proposed algorithm begins to show a little loss of the quality.

In case of the BasketballDrill (Class C) sequence, the performance of the proposed algorithm is similar to the original HM encoder in terms of bit rate and image quality up to 15 Mbps. In the range of high bit rate (from 20 Mbps), a slight larger loss of the quality was shown by about 0.07 (dB) of average value. This means that the proposed algorithm can provide very high speed-up factor while the quality loss may become larger than that of larger size of video.

We can know this loss is negligible as shown in Figs. 6, 7, and 8. Figure 8 shows a subjective comparison with the 20th reconstructed image of the original HM 10.0 reference software (Fig. 8(a)) and the proposed algorithm (Fig. 8(b)). The reconstructed images are very similar. It is difficult to find the difference of the quality in visual. This means the quality loss of the proposed method is negligible.

Fig. 8
figure 8

Comparison with the reconstructed image of the original HM 10.0 encoder and the proposed algorithm: Traffic Sequence (2560 x 1600, Class A, QP value is 37, the 20th frame).(a) Reconstructed image of the original HM 10.0 version. (b) Reconstructed image of the proposed algorithm

The proposed scheme also has good structure that can be run with other fast options in HEVC standard. Therefore, the proposed algorithm can be a very helpful real-time system with intelligent user interface for interactive multimedia system that handling large graphic data such as image processing data, broadcasting media, cloud computing media, and web video data.

4 Conclusions

We have proposed a new fast encoding algorithm both of early CU splitting termination (CST) and fast PU decision (FPU) for HEVC-based interactive multimedia system. The CST algorithm used the RD costs for different CU dimensions and motion complexity for PU level. The FPU method is based on spatial, temporal and depth correlation information and adaptive weighting factor design using temporal distance. By combiningCST and FPU, the proposed algorithm achieved, on average, a 51.72 % of time saving over the original HM 10.0 software and BDBR is 2.65 % with 1.09 % of bit rate loss. The proposed algorithm is useful for designing a real-time HEVC video encoding system with intelligent user interface for interactive multimedia services.