Keywords

1 Introduction

Video and image data compression has become an increasingly important issue in all areas of computing and communications. Various techniques for encoding of data can be used to eliminate the redundancy of the color information in images and video frames. Most video codecs and algorithms combine spatial compensation of images as well as compensation of movement in time. Currently, there are many compression standards. They can be found in a wide range of applications such as cable and land-based transmission channels, video services over the satellite, video streaming in Internet or local area network and storage formats. The most popular of these algorithms are MPEG, JPEG and H.26x. The MPEG standard describes a family of compression algorithms of audiovisual data, more details can be find in [5]. The well known members of MPEG family are MPEG-1, MPEG-2, and MPEG-4. H.261 is the first of the entire family H.26x of video compression standards. It has been designed for handling video transmission in real time. More information about the family H.26x can be found in [1]. JPEG and JPEG2000 standards are used for image compression with an adjustable compression rate. They are also used for video compression. These methods compress each movie frames individually.

In the proposed approach we used a Predictive Vector Quantization (PVQ) algorithm to compress a sequence of video frames, which combines two methods: Vector Quantization (VQ) and Differential Pulse Code Modulation (DPCM) [2, 4]. In order to reduce the amount of information needed to store the video stream we used two types of frames: key frames (intra-frame) and predictive frames (inter-frame). These frames were joined to groups containing one key frame and a series of predictive frames called GOP [14, 17].

The rest of the paper is organized as follows. Section 2 describes components of our algorithm. It includes a description of neural image compression and the encoding color information. In Sect. 3 we discuss our approach to neural video compression. Next in Sect. 4 the experimental results are presented. The final section covers conclusions and the plans for future works.

2 Related Works

2.1 Predictive Vector Quantization

Predictive Vector Quantization is a neural algorithm that extends differential pulse code modulation (DPCM) scheme with vector quantization method (VQ) [6, 7]. In PVQ neural predictor is responsible for vector quantization and the codebook is fulfilled by the DPCM function. The successive input vectors V(t) are the macroblocks of the same dimensionality obtained from a frame, where t are indices of consecutive macroblocks. The predictor’s input is a preceding macroblock, which after processing results in predicted vector \(\overline{V}(t)\). The difference between current macroblock and predictor output \(E(t) = V(t) - \overline{V}(t)\) is then calculated and used to select the best approximation \(g_j\) using neural quantizer from the codebook \(G = [g_0, g_1, \ldots , g_J]\). Approximation \(g_j\) is then added to the difference which gives reconstructed input vector \(\tilde{V}(t) = \overline{V}(t) + g_j\). This vector is later used as the next input to the predictor. The codebook index j is stored in a stream. In order to decompress data, the predictor is again used to calculate predicted vectors while stored codebook indices are used to select their corresponding approximations from codebook that are then combined with predictor outputs to reconstruct the original image.

2.2 Encoding Color Information

Most of modern hardware uses RGB color model to display images and video. Unfortunately, this color model is not an efficient way of storing and processing color information. Common standards for image and video compression such as JPEG and MPEG, and color encoding systems like PAL and NTSC take advantage of the way human vision works. Human eyes are very sensitive to small changes in brightness but they are far less sensitive to changes in chrominance [15]. For this reason we can use YC\(_b\)C\(_r\) color space. In this color space, Y is the luminance channel which describes brightness and C\(_b\) and C\(_r\) channels are chrominances which contain the remaining color information. In our method we used a modified conversion to luminance and chrominances which covers full range of values from 0 to 255 like RGB color values do. Conversion from RGB color space to YC\(_b\)C\(_r\) color space is presented in Eq. 1 while conversion from YC\(_b\)C\(_r\) space to RGB color space is presented in Eq. 2 [9, 13].

$$\begin{aligned} \left[ \begin{array}{c} Y \\ C_b\\ C_r \end{array} \right] = \left[ \begin{array}{c} 0\\ 128\\ 128 \end{array} \right] + \left[ \begin{array}{rrr} 0.299 &{} 0.587 &{} 0.114\\ -0.169 &{} -0.331 &{} 0.500\\ 0.500 &{} -0.419 &{} -0.081 \end{array} \right] * \left[ \begin{array}{c} R\\ G\\ B \end{array} \right] \end{aligned}$$
(1)
$$\begin{aligned} \left[ \begin{array}{c} R\\ G\\ B \end{array} \right] = \left[ \begin{array}{rrr} 1.000 &{} 0.000 &{} 1.400\\ 1.000 &{} -0.343 &{} -0.711\\ 1.000 &{} 1.765 &{} 0.000 \end{array} \right] * \left[ \begin{array}{c} Y\\ C_b - 128\\ C_r - 128 \end{array} \right] \end{aligned}$$
(2)

Since human eye is less sensitive to changes in chrominance, we can reduce an amount of information needed to store C\(_b\) and C\(_r\) channels without significant loss of quality. This operation is known as chroma subsampling and is used in modern lossy compression algorithms [10]. Commonly used variants, such as 4:2:2 and 4:2:0, are presented in Fig. 1.

Fig. 1.
figure 1

Chroma subsampling

3 Proposed Method

In this paper, we propose a method of video compression based on the solutions presented in [3, 8, 11,12,13]. Compared to the previous methods based on the full color chrominance (4:4:4), in our method we use 4:2:0 chrominance in order to reduce the amount of data needed to record frames [15]. In this approach, only the Y channel is stored completely, while the C\(_b\) and C\(_r\) channels are reduced to half their original resolution. Moreover, C\(_b\) and C\(_r\) channels are combined into one channel before compression. This approach allowed us to reduce the number of sets of codebooks and predictors to two, one for the luminance channel the other one for the combined chrominance channels (Fig. 2).

Fig. 2.
figure 2

Scheme encoding \({\mathrm{YC}}_{b}{\mathrm{C}}_{r}\) to two channels

Unlike the previous method that uses key frame detection algorithms, this method is based on a fixed size GOP (group of pictures) [14, 17]. Each GOP consists of a key frame (intra-frame) and subsequent predictive frames (inter-frame). Each key frame is composed of complete frame data that encode entire frame, and two predictors and codebooks which are used to decode the frame. Predictive frames do not contain predictors and codebooks which are taken from the previously encountered key frame, and instead of encoding the entire frame, they can encode smaller regions within the frame, taking the remaining information from previous frame. This allows us to significantly reduce the size of each predictive frame by storing only changes between subsequent frames (Fig. 3).

4 Experimental Results

For the purpose of our research, we conducted two experiments to test the efficiency of our algorithm. We used sequences of frames from a publicly available video “Elephants dream” in resolution 1280\(\,\times \,\)720 [16]. In all our tests we converted the frames into YC\(_b\)C\(_r\) color space. In 4:4:4 chroma subsampling, frames were encoded using three sets of predictors and codebooks. This corresponds to the method used in our previous works. In 4:2:0 chroma subsampling, C\(_b\) and C\(_r\) channels were first reduced by averaging 2\(\,\times \,\)2 squares and then combined into one 8-element channel. Due to this modification algorithm only needed one standard set of predictor and codebook for Y channel and one reduced set of predictor and codebook for combined C\(_b\)C\(_r\) channel. This allowed us to reduce the number of indices stored per macroblock from 3 to 2 and reduced computational complexity of the compression.

Fig. 3.
figure 3

Video compression algorithm

In the first experiment, we tested the efficiency of our algorithm in two types of chroma subsampling modes, 4:4:4 and 4:2:0. In both cases we used preset macroblock size 4\(\,\times \,\)4. The experiment showed that introduction of chroma subsampling did not significantly degrade the video quality which can be seen in Fig. 4.

Fig. 4.
figure 4

Differences between images before compression, after compression in 4:4:4 and 4:2:0 subsampling

We compared frames compressed using 4:4:4 chroma subsampling to frames compressed using 4:2:0 subsampling using Peak Signal-to-Noise Ratio (PSNR). Figure 5 shows that both methods result in very similar PSNR values.

Fig. 5.
figure 5

PSNR of compressed video frames in 4:4:4 and 4:2:0 subsampling

In the second experiment we compared the video quality after compression using predictive frames. Instead of key detection algorithm we used fixed GOP with a length of 30 frames. The experiment showed that there is no significant loss of quality in the example that uses predictive frames instead of encoding full frame what can be seen in Fig. 6. The quality was again compared with the version without predictive frames using PSRN (Fig. 7). Introduction of 4:2:0 subsampling and predictive frames resulted in about 48% reduction of compressed data, which means considerably increased compression ratio.

Fig. 6.
figure 6

Differences between images before compression, after compression with and without inter-frame

Fig. 7.
figure 7

PSNR of compressed video frames in 4:2:0 and 4:2:0 with inter-frame

5 Conclusions and Future Work

In this paper we showed that the presented method can be successfully used in video compression. The combination of 4:2:0 color subsampling and introduction of predictive frames allowed us to significantly reduce the size of compressed video with respect to previous approach. Thanks to encoding of YC\({_b}C{_r}\) channels into two we were able to reduce the number of predictors and codebooks to two from original three. Predictive frames made it possible to eliminate unnecessary compression of regions that didn’t change between consecutive frames, further reducing the size of compressed data. Those changes also resulted in reduced processing time.

Our future work will concentrate on improvements of the algorithm, especially optimizing the creation of predictive frames and enhancing the quality of video. In our research, we want to find and compare various schemes for encoding reduced YC\({_b}C{_r}\) channels to see which one of theme can improve quality and which one can improve compression ratio. Finally we will try to compare our results with known compression standards, such as MPEG.