1 Introduction

Motion-Compensated (MC) image coding, which takes advantage of frame-to-frame redundancy to achieve a high data compression rate, is one of the most popular inter-frame coding techniques [1, 2]. For the H.26x family of video coding standards, motion estimation (ME)/MC coding tool combined with an orthogonal transform (OT), such as a discrete cosine transform (DCT), has been introduced. This tool now plays an important role in the field of inter-frame coding. According to the conditional pixel replenishment method and quantization control in the DCT coefficient domain, the H.26x standards gain considerable coding efficiency and transmission bandwidth reductions by applying this method.

Conversely, we have developed a hybrid I-frame en-coding method based on non-causal interpolative prediction and differential feedback quantization that utilizes intra-frame spatial correlation [3]. To verify the efficiency of our hybrid coding method, we have coded four test sequences using the method. As a result, a considerable coding efficiency has been achieved by applying the developed method [3].

In this paper, a new configuration for P-frame coding is presented. In designing this hybrid coding scheme, we show that orthogonal transforms need not be considered as constraints.

2 Proposed Scheme

The proposed coding scheme is shown in Fig. 1. In this model, MC predictive coding is performed first, and then, the residual signal is encoded by an interpolative prediction (IP) method based on an 8 × 8 block [3]. We term this hybrid coding method the “MC+IP synthesis configuration.”

Figure 1
figure 1

Proposed coding scheme.

2.1 Motion-Compensated Predictive Coding

MC predictive coding in the proposed method is identical to that used for inter-frame MC prediction of P-frames in the H.264 video coding standard. Here, the number of reference frames is set equal to one.

2.2 Interpolative Prediction

The residual signal after motion compensation is coded by an IP method, which is also known as non-causal prediction, based on 8 × 8 blocks. Given a block whose boundary conditions were known, coding based on an interpolative model could be realized [4].

Given a 8 × 8 block, and the pixels in this block, x 1 ~ x 64 are rearranged in a conventional order as a 64 × 1 vector X n. Then multiply X n by the predictive matrix C to get the prediction errors. The encoding matrix C used in this paper is described in Eqs. (2)~(5). This predictive matrix here is similar to the one presented in Ref. 3, except for the elements that correspond to the four corner pixels of a block in the intra-block processing.

A configuration of MC predictive coding with interpolative prediction is shown in Fig. 2.

Figure 2
figure 2

MC predictive coding with interpolative prediction.

In our configuration, the predictive error Y n can be expressed as

$$ {Y}_n=C\times {X}_n-C\times f\left({X}_{n-1}^{\prime}\right)=C\times \left({X}_n-f\left({X}_{n-1}^{\prime}\right)\right) $$
(1)

where X n is the input 8 × 8 block signal in 64 × 1 vector form after last order scanning; X n − 1 is the reference vector in the last reconstruct frame at the exact same position; f () indicates the MC function, and therefore, f(X n − 1 ) is the vector of X n − 1 after MC processing; and C is the 64 × 64 predictive matrix, which can be represented as follows:

$$ C=\left[\begin{array}{lllll}{A}_1\hfill & \hfill & \hfill & \hfill & O\hfill \\ {}{A}_3\hfill & {A}_2\hfill & {A}_3\hfill & \hfill & \hfill \\ {}\hfill & \ddots \hfill & \ddots \hfill & \ddots \hfill & \hfill \\ {}\hfill & \hfill & {A}_3\hfill & {A}_2\hfill & {A}_3\hfill \\ {}O\hfill & \hfill & \hfill & \hfill & {A}_1\hfill \end{array}\right] $$
(2)
$$ {A}_1=\left[\begin{array}{lllll}1\hfill & \hfill & \hfill & \hfill & \hfill \\ {}-\frac{1}{2}\hfill & 1\hfill & -\frac{1}{2}\hfill & O\hfill & \hfill \\ {}\hfill & \ddots \hfill & \ddots \hfill & \ddots \hfill & \hfill \\ {}\hfill & O\hfill & -\frac{1}{2}\hfill & 1\hfill & -\frac{1}{2}\hfill \\ {}\hfill & \hfill & \hfill & \hfill & 1\hfill \end{array}\right] $$
(3)
$$ {A}_2=\left[\begin{array}{lllll}1\hfill & \hfill & \hfill & \hfill & \hfill \\ {}-\frac{1}{4}\hfill & 1\hfill & -\frac{1}{4}\hfill & O\hfill & \hfill \\ {}\hfill & \ddots \hfill & \ddots \hfill & \ddots \hfill & \hfill \\ {}\hfill & O\hfill & -\frac{1}{4}\hfill & 1\hfill & -\frac{1}{4}\hfill \\ {}\hfill & \hfill & \hfill & \hfill & 1\hfill \end{array}\right] $$
(4)
$$ {A}_3=\left[\begin{array}{lllll}-\frac{1}{2}\hfill & \hfill & \hfill & \hfill & \hfill \\ {}\hfill & -\frac{1}{4}\hfill & \hfill & O\hfill & \hfill \\ {}\hfill & \hfill & \ddots \hfill & \hfill & \hfill \\ {}\hfill & O\hfill & \hfill & -\frac{1}{4}\hfill & \hfill \\ {}\hfill & \hfill & \hfill & \hfill & -\frac{1}{4}\hfill \end{array}\right] $$
(5)

Here, A1, A2, and A3 are 8 × 8 matrices, shown as Eqs. (3)~(5). About more details of matrix C see Appendix.

2.3 Optimal Quantization Scheme

The difference signals output by the interpolative process, which correspond to IP errors of MC residuals, are sequentially input into the feedback quantizer [5]. Accordingly, coding errors resulting from power expansion in the inter-block processing, as a result of using a non-orthogonal system, can be solved [5, 6].

2.4 Conditional Pel Replenishment

As already mentioned, because an OT is not employed in our method, energy is not concentrated in one location, but is distributed throughout an entire block. Consequently, determining whether pixels should be quantized is an issue. For this reason, we have introduced conditional pel replenishment into the scheme.

As shown in Fig. 3, the data of a previously decoded frame F n − 1 , used for motion compensation, can also be used as a reference for conditional replenishment pixel control before the current pixel data are quantized. As a result of pel replenishment, transmission bandwidth is constrained.

Figure 3
figure 3

Conditional pel replenishment.

Specifically, the decoded data of the previous frame are also processed by IP based on 8 × 8 blocks to obtain a set of reference values; whereupon these values are compared with a predefined pixel threshold (PTH). Because the reference values are obtained from the decoded data, this conditional pixel replenishment can be achieved without additional overhead information.

On the other hand, differences of IP outputs between current and previous frames (reference values) are then compared with the 4 × 4 sub-block threshold (BTH) (obtained by preliminary experiments). Thus, we can determine whether the pixels should be quantized in the 4 × 4 sub-block of the 8 × 8 block under consideration in the current frame. At this stage, since the reference values are obtained from the decoded data, motion vector information (which the decoder has already acquired), and current 4 × 4 sub-block data (which have not been transferred to the decoder yet), it is necessary to add the information for each 4 × 4 sub-block to verify whether it should be quantized. As a result, this conditional pixel replenishment can be achieved with 1 bit (ON/OFF) additional overhead information for every 16 pixels (4 × 4 sub-block).

A threshold replenishment algorithm for adaptive vector quantization was first proposed by Fowler [7] in 1998, and has subsequently been used in various coding technologies [811]. The proposed conditional pixel replenishment method without codebook used in our method is based on that algorithm, because quantization is not performed on vectors, but on each individual pixel in an 8 × 8 block. Furthermore, the reference values are the output of IP, not the distortion measure between a code-vector and a quantization input vector [7]. Therefore, the distortion measure for each block is not computed, meaning that the computational time of the proposed method is less than that of Ref. 6.

However, the threshold values here are predefined and must be modified according to each frame of a video sequence. Improving the threshold selection process is considered to be an area for future research.

2.5 Rotation Scanning

In this paper, in order to achieve replenishment of pixels in the spatial domain, we have proposed a new approach for improving image quality. An input 8 × 8 block signal is reordered to adopt this sub-block system.

In our feedback quantization system, the power of a coding error can be expressed as

$$ E={\displaystyle \sum \left({f}_i\cdot {\sigma}_{q_i}^2\right)} $$
(6)

Here, f i is the feedback coefficient for one 8 × 8 block, and σ 2 qi is the power of the quantized error. When a 4 × 4 sub-block conditional pixel replenishment is performed, non-significant sub-blocks will not be quantized; as a result, the power of the quantized error in (6) is changed to the power of the prediction error. Generally, the power of the quantized error is less than that of the prediction error. Therefore, if we could reduce the value of f i at the non-significant sub-block positions, we could suppress the increase in coding error powers in (6). According to Ref. 4, the value of f i is defined by three matrices: the predictive matrix C I , the scanning order matrix P, and the transform matrix D. Because OT is not used in our system, matrix D has been determined; predictive matrix C I defines the prediction error, which also has been determined.

When the scanning order is 4 × 4 × 4, as shown in Fig. 4, a resultant 8 × 8 block of f i values is shown in Table 1.

Figure 4
figure 4

4 × 4 × 4 scanning order.

Table 1 8 × 8 block of f i values.

Each sub-block’s ∑f i is as follows:

$$ \begin{array}{ll}\#1\hfill & {\displaystyle \sum \left({f}_i\right)=19.4527}\hfill \\ {}\#2\hfill & {\displaystyle \sum \left({f}_i\right)=26.2304}\hfill \\ {}\#3\hfill & {\displaystyle \sum \left({f}_i\right)=26.8412}\hfill \\ {}\#4\hfill & {\displaystyle \sum \left({f}_i\right)=40.1022}\hfill \end{array} $$
(7)

Therefore, after deciding whether the current 4 × 4 sub-block is to be quantized (e.g., the sub-blocks of #4 need not be quantized because Σ (f i ) of the sub-blocks of #4 is the largest value among all the sub-blocks), according to the definition of distortion in a feedback quantization system (as shown in (6)), it is necessary to reorder the input signal to ensure that non-significant sub-blocks can be quantized as early as possible. Table 2 depicts this processing.

Table 2 Rotation processing for all cases.

In Table 2, we show how, depending on the insignificant/significant (expressed as 0 and 1 in various positions of a 4 × 4 sub-block) sub-block case, rotation processing should be assigned. L indicates a left rotation operation (a counterclockwise rotation); whereas R indicates a right rotation (a clockwise rotation); R2 indicates two right rotations, and “none” indicates no rotation. However, there are two special cases in this table, which are marked with gray backgrounds: 0110→0111 and 1001→0111. In these cases, after a rotation operation, the last sub-block, #4, has been set to 1 mandatorily on the basis of preliminary experimental results.

The effects of improving coding efficiency by these rotation operations are shown in Table 3.

Table 3 Coding efficiency improvement by rotation operation.

2.6 Features of the Proposed Scheme

The proposed inter-frame coding scheme has three characteristic features:

  • OT is not used.

  • Conditional pel replenishment is performed without additional overhead information.

  • A new hybrid coding framework, MC+IP combined with feedback quantization, is employed.

OT is not used in our coding scheme; instead, an IP method has been adopted. In fact, OT could not utilize the relations between pixels, but could merely transform the signal from the spatial to frequency domain [1223]. In contrast, an IP method can compress the signal by eliminating the correlation between pixels within a frame [2428]. Generally, the MC prediction error is independent of time; however, a spatial correlation still exists. Accordingly, we have replaced OT by an IP method, because in Ref. 3, we have shown that IP can be achieved as a “transform coding.”

Conversely, the non-orthogonality of the IP transform matrix means that the power expansion problem, in which coding errors expand when decoded, exists in the proposed method. As a result, feedback quantization is necessary.

3 Simulation

We now present simulation results acquired by using the proposed P-frame coding scheme, and show predictive error comparison between our method and the traditional causal prediction, in this case, H.264 baseline [29]. The experiments here were designed to make two major points. First, we seek to show the coding efficiency of MC+IP without DCT. Second, we want to emphasize the importance of using the Conditional Pel Replenishment and Rotation technology in MC+IP configuration. Therefore, we give the experiment results without Conditional Pel Replenishment and Rotation (symbol as proposed (NO BTH)) and the results without Rotation only (symbol as proposed (NO ROT)) respectively.

To eliminate the influence of the I-frame, since its decoded image is used as the first reference frame when performing motion compensation, the first frame of the test sequences for the two methods have been coded by the H.264 I-frame baseline under the same parameter values.

The first twenty frames of foreman.cif CIF (352 × 288) and first ten frames of three CIF (bus, flower and highway) test video sequences from the YUV Video Sequences website (http://trace.eas.asu.edu/yuv/) were used. MC coding parameters (both methods are the same at this point) have been set as follows:

  • Search range: 32 pixels

  • Total number of reference frames: 1

  • ME scheme: fast full search

  • PSliceSearch8 × 8: 1 (used, all other types are not used)

  • DisableIntraInInter: 1 (Disable Intra mode for inter slice)

  • Rate-distortion-optimized mode decision: used

Besides these parameters, the threshold values for the proposed scheme have been adapted to the input frames.

3.1 Simulation of Prediction Errors

A simulation of the prediction errors results is shown first. By this comparison, we can see the distribution of errors obtained by the proposed scheme, and whether spatial correlation exists in the signal after motion compensation. Table 4 lists several statistical values for both methods that reflect the distribution of their respective residual signals.

Table 4 Statistic values of residual signals.

In this table, “PM” stands for “proposed method”; “Old” stands for traditional causal prediction; entropy is calculated based on Shannon theory, and average error is the average value of two signal powers. The number of 0’s indicates how many pixels have been accurately predicted.

Figure 5 shows the prediction errors for the first P-frame at each pixel position when using the first frame of foreman.cif as a test image. Here, the x-axis represents pixel position, and the y-axis represents the value of prediction error.

Figure 5
figure 5

Simulation results of inter-frame prediction errors both proposed method and traditional causal prediction (strictly speaking, here the prediction errors of proposed is the output of interpolative prediction, not just the output of MC).

The distribution of the prediction errors for the proposed method is clearly more concentrated around zero than that for old methods, and the residual signal power is approximately 37.6 % lower under the proposed method. Therefore, we believe that our scheme provides improved coding efficiency if an appropriate quantization method is employed.

3.2 Comparison of Coding Efficiency

Next, we show the coding results for the proposed method. As stated before, the threshold values in our method must be changed for each frame, and these thresholds are obtained by a preliminary experiment. Since PTH and BTH are typically within a certain range, the results presented here are also limited. For this reason, the conditional pel replenishment used here has the potential for improvement.

Figures 6, 7, 8 and 9 show comparisons between the methods’ coding efficiency when entropy of old method and proposed are almost the same for the respective test sequences of foreman, bus, flower and highway. ρ h and ρ v indicate the correlation of the test images in the horizontal and vertical directions, respectively.

Figure 6
figure 6

Coding efficiency for foreman.cif, ρh = 0.9726 ρv = 0.9583.

Figure 7
figure 7

Coding efficiency for bus.cif, ρh = 0.8989 ρv = 0.8414.

Figure 8
figure 8

Coding efficiency for flower.cif, ρh = 0.9087 ρv = 0.7893.

Figure 9
figure 9

Coding efficiency forρh = 0.9964 ρv = 0.8710.

In these plots, the horizontal axis expresses the frame number of the coded frames (0–20 frames for foreman.cif and 0–10 frames for other test sequences) and the vertical axes express the entropy (bit/pixel; upper plot) and peak signal-to-noise ratio (PSNR; dB; lower plot) of each test image. Although the number of bits required by the proposed method (its entropy) is approximately equal to that of old method, here means H.264 baseline, PSNR for the proposed method is consistently higher (the average improvement is about 0.3–2 dB).

Table 5 shows the coding results when the proposed method is applied to the bus CIF video sequence as an example. Here, Q(Ѕ) is the average entropy for all pixels in the frame; and H(Ѕ) represents the overhead (motion vector) entropy.

Table 5 Coding results for proposed method.

3.3 Simulation of Motion Vectors

Table 6 lists the entropy of motion vectors in each frame for the two methods. Moreover, Fig. 10 shows the decoded frame obtained by the two methods for four test sequences, where the lines denote the motion vectors. The proposed method clearly contains less motion-vector information than causal method. Furthermore, the corresponding decoded frames without the motion vectors are shown in Fig. 11.

Table 6 Entropy of motion vectors.
Figure 10
figure 10

Comparison of motion vectors for the decoded frame obtained by traditional coding scheme (left) and the proposed scheme (right).

Figure 11
figure 11

Comparison of decoded frame obtained by traditional coding scheme (left) and the proposed scheme (right).

3.4 Effect of Adaptive Coding Between Intra/inter Modes

Figure 12 shows the simulation results of adaptive coding between the inter/intra coding modes. Insertion of the PinP configuration (Picture-in-Picture configuration, as shown in Fig. 13) is set to start at the first frame, and completes at the eighth frame. In Fig. 12, the PSNR of “inter mode only” indicates the value of the PSNR when all frames from the second sequence are forced to be codes in inter frame mode and the quantity of transmission bits is almost the same as that of the proposed adaptive scheme.

Figure 12
figure 12

Coding efficiency for adaptive mode selection.

Figure 13
figure 13

PinP configuration.

If the coding mode is fixed to be inter coding, as shown here, improvement of coding efficiency begins at the second P-frame, and after coding 2 frames, efficiency eventually approaches that of the adaptive system. The arrow in this figure represents how much improvement can be achieved by the proposed adaptive scheme.

On the other hand, because the adaptive intra/inter mode selection method with overhead information has been applied in the old methods, when using the PinP test sequence shown in Fig. 13, a significant decrease in PSNR does not occur.

In addition, in the traditional coding scheme, such as H.264, because intra-frame coding mode can be selected at any time throughout the entire coding period, the overhead for every macro-block is approximately 0.01 bits/pixel more than in the proposed method. However, the disadvantage of the proposed method is that adaptive mode selection is not sufficiently flexible. In the future, the authors intend to improve upon this point.

4 Conclusion

In this paper, we have proposed a new inter-frame coding scheme in which the OT (e.g., a DCT) used in conventional hybrid coding schemes is replaced by a non-causal IP method. Application of this IP method can potentially reduce the amount of signal power a priori. Since IP methods utilize spatial correlations between pixels, we believe them to be more effective than OT (which merely perform a domain transform) for residual data that have been compressed after MC inter-frame prediction. However, combining our method with OT is also of great research interest, as shown above in Section 2 of this paper.

Accordingly, we have also introduced conditional pel replenishment to our scheme. Moreover, no additional overhead information is added by employing our method. Our model thus has the three characteristic features shown in Section 2.6. Simulation results of the proposed method in Section 3 showed that over four test sequences, the proposed scheme achieved a very considerable coding efficiency.

As a topic for future research, the conditional pel replenishment method utilized in our scheme could be improved, and this should be addressed first. Other areas that could also be explored are whether the proposed scheme can maintain high coding efficiency if the test sequence becomes large.

In conclusion, we have introduced a different approach to P-frame hybrid coding that utilizes spatial correlation for the MC residual signal. Since our hybrid video coding method has achieved high coding efficiency without employing OT, we have shown the feasibility of non-orthogonal transforms for effective coding.